Self-managed MPP databases are powerful clustered databases that allow for massive customization, flexibility, and functionality that must be managed manually by the customer, usually by a database administrator or DevOps team.
While self-managed MPP databases have traditionally been deployed on-premise, these databases now have the flexibility to be deployed in the cloud as well, which allows for a larger array of deployment options.
One key proficiency shared by all of the databases within the self-managed MPP category are their mature SQL dialects and integrations. This makes them ideal options within an enterprise data stack. For example, HPE Vertica, and Teradata all offer powerful connectors to Apache Hadoop. These integrations, combined with their high concurrency, enable these databases to support a large enterprise workforce.
Self-Managed MPPs are designed to integrate with and work alongside existing database services and workflows such as Hadoop. All of the examples of self-managed data warehouses have been around for many years (Teradata) or built on existing data warehouse technology, so they’re good candidates for enterprise workloads.
One of the biggest features of self-managed MPP databases is the amount of control they grant users over hardware selection, table architecture/storage, and query optimization. Leveraged wisely, these options allow organizations to design a structure that is highly performant and efficient.
Many self-managed MPP solutions provide mature SQL dialects and broad integrations that provide advanced analytics and user-defined functions. A few have special analytical capabilities and integrations (such as Vertica’s geospatial and machine learning libraries) that make them a great choice for specific analytical workflows.
Every single node within a self-managed MPP database has its own storage, memory, and compute resources. This is known as a “shared nothing” architecture because storage and compute resources are not shared across the entire system.
This “shared-nothing architecture” enables multiple nodes to work in parallel to process a query. When a query is issued, a leader node creates a plan and distributes it to each of the individual nodes. These nodes then process their portion of the query (requesting needed data from other nodes across the network). These intermediate results are then sent back to a “leader node” that then aggregates the results together.
Self-managed MPP databases give you flexibility and customization, but in exchange you take on managing some of the complexity yourself. So, if you don’t have the resources to manage a database on your own, you should probably look at an on-demand MPP database. Similarly, if you’re upset about the lack of flexibility offered by an on-demand solution, a self-managed solution might be a better fit for you.
Managed MPP databases vary greatly in how they can be optimized because of differences in how they deal with caching, distribution/sorting, and storage. We’ll cover each in more detail in the guides to individual databases.
But there are a few general guidelines that can help:
Business intelligence, big data analytics, or a 360° view of your customers.
Whatever you need, Looker can help. Talk to our data experts.