What is PostgreSQL?
PostgreSQL is an open source object-relational database management system (ORDBMS) that is typically utilized as both a production database and analytical database. While MySQL prioritizes scalability and performance, Postgres prioritizes SQL compliance and extensibility.
Organizations looking to maintain a high level of data integrity and customization will typically choose Postgres. This is due to PostgreSQL’s support of fully ACID transactions and custom procedures.
Unlike other open source transactional databases, Postgres also prioritizes ANSI SQL compliance, containing a comprehensive SQL library and extensible functionality that can be used to support advanced analytical workloads.
Postgres can be downloaded for free and deployed on commodity hardware, or it can be run in the cloud through a variety of vendors. While Postgres is feature-rich and tailored for OLAP workloads, performance in Postgres tends to hit a limit as data volumes scale beyond several terabytes.
Choosing PostgreSQL as a database
One of the biggest advantages that Postgres has as an object-relational database is the fact that the architecture is extensible. This allows the database to support user-defined functions and third-party libraries that can be used to customize and automate procedures within the database.
ANSI SQL compliance
PostgreSQL has an elegant and comprehensive SQL syntax, which is helpful for organizations looking to use Postgres for OLAP workloads. As of this writing, Postgres conforms to ANSI-SQL: 2008 standard.
Postgres uses multiversion concurrency control to support writing and reading to the same databases without imposing a lock on the user.
What is PostgreSQL great for?
User-driven customization of stored procedures
PostgreSQL’s strength as an object-relational database is that users have the ability to modify stored procedures and functions fairly easily. The ability to add user generated code into the database on the fly makes Postgres a great candidate for prototyping and implementing custom procedures and workflows.
Breadth of functionality
For organizations interested in breadth of functionality, Postgres is a good alternative to the popular, proprietary relational databases such as Oracle and MSSQL because it can be deployed for much lower costs. However, depending upon what you've chosen to pay for, those options may offer higher levels of support than what you may receive with Postgres.
PostgreSQL’s software is open source and can be downloaded for free. However, it’s important to note that the total cost of operation will heavily depend on the allocated hardware; more disk space, CPU and memory will cost more. Server storage and dedicated maintenance costs are additional costs associated with self-hosted PostGreSQL solution.
If you’re interested in managing a PostgreSQL instance in the cloud, the service is also offered by a variety of cloud vendors such as Amazon Relational Database Service, Google Cloud SQL, EnterpriseDB, and Heroku.
Unlike other transactional databases, PostgreSQL is implemented on a single server and generally is not built to distribute its storage or compute functions across multiple nodes. While it’s possible to use clustering, replication, and pooling techniques to scale your Postgres cluster for performance and additional capacity, these solutions are complex and not common. Many organizations choose to upsize Postgres by vertically scaling the database, which means purchasing a bigger and more powerful server.
There are a couple of unique features about PostgreSQL’s architecture that are worth calling to attention.
Object relational database management system
Postgres is classified as an object-relational database management system (ORDBMS). This means that unlike traditional relational database management systems (RDBMSs), Postgres has an object-oriented architecture.
Most RDBMSs will store information about the actual data contained in the database, such as tables. However ORDBMSs such as Postgres will store much more information than that, such as information about different data types, SQL functions, and more. Postgres in particular allows individual users to edit these tables, allowing them to add new datatypes, and SQL functions to customize how their data is stored and interacts with the database.
This architecture is what makes Postgres extensible and gives uses the ability to experiment with new applications of their stored data without messing with any source code.
Multiversion concurrency control
Another key strength of PostgreSQL’s architecture is concurrency, or the ability to support writing data to the database while simultaneously reading data from the database.
Other transactional databases don’t support reading and writing from the same database. This is because if a read occurs in the middle of a write, a user could read out data that’s in the middle of updating, which could result in inaccurate reads from the database.
In order to resolve this issue, databases such as MySQL institute a policy of locking the database when data is being writing to storage. This requires organizations to physical replicate the database in order to perform writes and reads on the same system.
Postgres uses a system known as multiversion concurrency control (MVCC) that effectively takes individual snapshots of the entire database to read purposes. As data is loaded into the database, more recent snapshots of the tables are captured which the user then reads from. The result is a lock-free database system.
Types of data
Like most transactional databases, PostgreSQL’s’ core competency lies in relational data storage. With that said, Postgres is good with small to moderate amounts of JSON and supports native JSON parsing and even JSON indexing through HStore. It is not natively a columnar store. There is an extension that allows columnar storage which is free (depending on hosting) because Postgres is open-source.
A full list of PostgreSQL’s supported datatypes can be found here, but it’s also worth noting that users can create their own data types as well.
Range of row (quantity)
PostgreSQL’s performance will start to be impacted as storage volume approaches multiple terabytes of data. As with any database, this can be addressed with various optimizations.
However, after applying all of the optimization techniques available, if one doesn't experience interactive query times (measured in seconds, rather than minutes) when writing analytical queries, it may be time to think about migrating OLAP queries to an MPP database.
Because it’s open source, very little is required to implement Postgres on a local system. In addition to the Postgres documentation (which is quite verbose), there are various simple guides for installing Postgres on different operating systems:
Installing on a hosted/cloud instance requires creating an account and establishing a cluster with Amazon AWS.
Note - Postgres will create a default user named 'postgres', however this user should not be used for anything other than accessing the database software for security reasons.
Process for new data
Creating new tables on PostgreSQL is straightforward using the
CREATE command, similar to MySQL.
Postgres uses a standard
INSERT function like MySQL. However Postgres does not leverage the
REPLACE extension, meaning slightly cumbersome workarounds are required to replicate this MySQL operation for a Postgres equivalent.
Postgres and Python play well together particularly using the Python module pyscopg2 which makes
UPDATE operations programmable.
There are a large number of resources online to help you with monitoring and tuning performance on your PostgreSQL instance.
One of the biggest factors that impacts performance of Postgres is being able to properly index your tables within your instance. An index is equivalent to a table of contents for a database that the database uses when attempting to locate data. Just like a table of contents in a book prevents you from thumbing through every page, an index within a database prevents you from having to perform a full scan of the database table to locate data.
Good index practice usually comes down to understanding the different types of indexes and how and when to use each effectively. This article provides a great explanation of the different types of indexes and when to use them.
Another important facet of effective Postgres performance is writing performant SQL and understanding how Postgres interprets written SQL.
The best way to understand how Postgres interprets a particular SQL query is to run an
EXPLAIN plan on your query. This guide from Postgres does a pretty comprehensive job at explaining the
EXPLAIN statement, but this guide breaks down the toughest bits in a slightly more digestible way.
- MySQL users who transition to Postgres sometimes don't realize the broader set of querying features that Postgres includes, which is a shame, because in some cases they allow for more advanced analysis and more elegant query writing.
- Postgres uses slightly different naming conventions for data types than MySQL. For an exact breakdown, check out this handy matrix of differences between MySQL and Postgres data types.