What is MySQL?
MySQL is an open source relational database management system (RDBMS) that can be easily implemented and managed either on-premise or via the cloud through a hosting provider. It supports lots of simultaneous writes and scales via replication (though this can get complicated). For this reason and because of its relatively low maintenance/scalability costs, it is primarily used as a production database.
Created in 1994, MySQL is one of the best-established RDBMSs and has evolved significantly since then. Ninety percent of websites use MySQL. These include giant services like Youtube, Twitter, Wikipedia and Facebook.
Choosing MySQL as a database
Open source capabilities
MySQL was built as an open source solution, so it can be deployed on commodity hardware and scaled with predictable costs. MySQL’s broad compatibility also means it can be switched out for a more tailored solution at any time (many competing transactional databases are MySQL compliant for this exact purpose).
MySQL is the database equivalent of the racecar that’s stripped out rear seating to make it go faster. MySQL’s developers made the decision to prioritize speed and performance over capabilities, which makes MySQL a faster, albeit more limited database than other providers in transactional database category.
Availability and scalability
MySQL’s ability to be replicated and distributed for high availability and scalability is extremely powerful. It must be noted however, that higher levels of availability and scalability must be balanced out by higher complexity and costs.
What is MySQL great for?
MySQL was primary designed for web applications. It’s particularly great for structured and well-planned web applications. Oracle created a great resource describing why MySQL is a suitable solution for web apps.
Newer companies without a sophisticated data team
MySQL is often the first choice for smaller companies or startups because of its reliability, ubiquity, and performance. MySQL is also an attractive option because it’s easy to spin up a replica of a production MySQL database to use as an analytical database. MySQL has a robust developer community, is one of the best-documented DB systems in the market, and is sold by a large number of vendors.
MySQL host options & pricing
MySQL’s software is open source and can be downloaded for free. MySQL is also offered by a variety of cloud vendors.
MySQL deployed on-premise
Deployed on premise, the MySQL software itself can be downloaded and deployed for free. However, the total cost of operation will heavily depend on the allocated hardware; more disk space, CPU and memory will cost more. Server storage and dedicated maintenance costs are additional costs associated with self-hosted MySQL solution.
MySQL deployed in the cloud
Hosting MySQL in the cloud is less complex to manage than an on-premise solution, but offers less transparency and control.
There are dozens of MySQL providers, but some of the largest cloud-hosted include:
Amazon RDS’s on-demand pricing starts at just a couple of pennies per hour for their smallest instance and grows from there. If you can commit to a longer term, prices go down significantly. Amazon offers multi-availability-zone deployments which “provide enhanced availability and durability for database instances”. Storage costs are not included and are priced per GB per month. This comes with no additional charge for backup storage.
Google Cloud SQL
Google offers two different pricing schemes for Cloud SQL (their MySQL offering). First generation comes as a complete package or can be billed on a per-use basis. Second generation is charged by the minute that the instance is running and is more configurable.
Use Google’s Pricing Calculator to estimate which plan would be best for your specific needs. And note that Google offers a $300 credit for free to spend in the first 12 months.
Oracle Cloud pricing is quite straightforward, with hourly and monthly plans for service uptime. Storage pricing for hourly service is tiered and starts at a couple of pennies per GB per month. The monthly plan’s storage starts at $30 per TB per month. Outbound data transfer incurs additional costs beyond the first GB per month.
MySQL’s architecture consists of several layers:
MySQL’s application layer is how different clients connect to MySQL and issue queries. Applications such as MySQL Workbench and Looker connect to MySQL’s Application Layer to send queries and control user access to the database.
Once a query is issued from the application layer to read or write data from the underlying storage, the query processor translates the query into a query plan that the database can execute.
MySQL is ACID-compliant, meaning a set of queries can be encapsulated in a transaction and all of them succeed or all of them fail. The transaction manager manages this by issuing a COMMIT command to execute each transaction. If a transaction fails, the transaction manager will issue a ROLLBACK command to undo any changes to the database earlier in the failed transaction.
MySQL is highly resilient, and the recovery manager is responsible for returning the database back to its last stable state in the event of a crash. It logs every operation that’s completed on the database (since its inception) and, in the event of a crash, executing every command in the log, thus effectively returning the database back to its last stable state.
The storage manager is responsible for allocating memory resources necessary to pull the data from the physical disk and deliver the results to the client.
MySQL’s RDBMS is very different from other transactional databases when it comes to storage engines, because of its pluggable architecture. Because MySQL’s server core code is separate from its storage engine, storage engines within MySQL can be swapped in and out, which gives users the flexibility and ability to customize their storage to their needs.
This is important because which storage engine you use dramatically impacts database performance. MySQL’s documentation contains a comprehensive list of supported storage engines.
MySQL supports transactions with the integration of the BDB and InnoDB engines (the default engine). This allows safer handling of concurrent write operations, which began the trend of adding features needed by the Enterprise environments.
MySQL runs on Linux, Solaris, Windows, AIX and HPUX and offers both 32-bit and 64-bit versions.
MySQL is mostly compatible with MariaDB and Amazon Aurora. As with any dialects there are some differences (e.g. Aurora only supports read replicas), but the three dialects are close cousins., One advantage of MariaDB is it offers window functions, while MySQL does not.
Types of data
MySQL performs better with well-structured transactional data and data structured in third normal form. MySQL will also perform better with smaller, fuller datasets. It does not like sparse data or wide tables. The sparser the table, the harder it is to write; the wider the table, the harder it is to read.
MySQL can support large, or even very large tables, but at much greater complexity and financial cost. Because MySQL is highly reliant on indexing, knowing what types of queries you’ll be running in advance can make a big difference in performance.
Maximum recommended size of data
While there’s no hard upper limit,few would recommend loading more than one billion rows. InnoDB tables do have a hard maximum of 1017 columns.
Implementation and MySQL performance tuning
You could write a whole book on tuning MySQL (and people have!).
But suffice to say that you get more customization options (and more ways to shoot yourself in the foot) when hosting locally. When choosing a cloud hosted solution, creating an account with Google Cloud Storage or spinning up a cluster with Amazon RDS will get you started in minutes. Since these are hosted solutions, most of the things you’d normally need to worry about when deploying on own servers, will be taken care of for you.
Inserting new data
Streaming is popular when it is used transactionally. You can also run batch inserts.
Maintenance and scaling MySQL
MySQL works well out of the box for small datasets and workloads, but as both data and demand for concurrency scales, MySQL can be scaled several ways.
Replication is a common way of configuring multiple servers under a single MySQL instance, with each replica being provisioned for specific tasks.
There are two main types of replication:
Master-slave replication, which is characterized by having one production database that’s in charge of writing data, and potentially many slave databases that are read-only replicas of the master database. This allows an administrator to scale concurrent reads on the instance.
Multi-master replication, which is similar to master-slave replication, but allows you to both scale writes and increases availability. However, the additional complexity imposed by this replication system increases latency.
Clustering is different than replication because rather than duplicating the data across many different databases to scale reads or writes, the data itself is distributed across different servers to increase performance. Implementing a MySQL cluster also involves implementing a storage engine specific to clustering, called MySQL NDB cluster.
Partitioning is a way of splitting up database tables into smaller subsets to decrease index size and therefore increase performance. One popular method of partitioning is to use a process called sharding. Sharding is a practice by which tables are broken up horizontally and partitioned across different databases on one or multiple machines. Pinterest is well-known for scaling their MySQL database. They use sharding, master-master replication, and clustering to distribute their massive tables over a single MySQL instance.