What is Azure SQL Data Warehouse?
Azure SQL Data Warehouse (SQL DW) is a petabyte-scale MPP analytical data warehouse built on the foundation of SQL Server and run as part of the Microsoft Azure Cloud Computing Platform. Like other Cloud MPP solutions, SQL DW separates storage and compute, billing for each separately. Unlike many other analytical data warehouse solutions, SQL DW abstracts away physical machines, and represents compute power in the form of data warehouse units (DWUs) which allow users to seamlessly and easily scale compute resources at will.
Azure SQL Data Warehouse is part of the Microsoft Azure Cloud Computing Platform, which makes choosing this database a virtual no-brainer for companies already invested in the Microsoft technology stack.
Choosing Azure SQL Data Warehouse
Microsoft’s Azure cloud computing platform
Microsoft's Azure ecosystem is massive, and offers a comprehensive array of solutions with a multitude of tools that you can easily add on to get more out of your stack at any time. Rather than using one database with a bunch of 3rd-party services, you can invest completely in Microsoft’s platform.
Ease of use
Azure SQL DW abstracts compute resources away from the user, instead representing them in the form of DWUs, which a user can scale up and down very easily to provision more or less resources to their data warehouse.
There are a variety of options provided to end users for scaling the performance of their Azure SQL Data Warehouse. Users can either upgrade their individual virtual machines, or add more compute resources in the form of additional machines, which can be configured to scale automatically.
What use cases is Azure SQL Data Warehouse really great for?
Organizations using SQL Server/SQL database
Azure SQL Data Warehouse is built from the same technology as SQL Server and SQL Database, making loading data from SQL Server to SQL Data Warehouse very easy.
Organizations Storing Data in both Azure Blob Storage and Hadoop Organizations that store data in both Azure Blob Storage and HDFS will find a great tool in Azure, as its Polybase service allows users to blend relational data stored in Azure with non-relational data stored in Hadoop.
Compute and Storage are billed separately for Azure SQL DW. Unlike AWS, the hardware sitting behind Azure DW is not made known to the customer. For compute, amount of data warehouse units (DWUs) are selected as a measure of horsepower. For storage, the customer will be billed at the same rate as Azure Premium Storage.
Microsoft uses a pricing calculator on their website to help you estimate both compute pricing (for the SQL Data Warehouse), and storage pricing (for Azure Cloud Storage).
Azure SQL Data Warehouse architecture
Azure SQL Data Warehouse has a similar architecture to other managed MPP databases in that it decouples its storage from compute. Azure SQL Data Warehouse is built right on top of Azure Blob Storage and dynmaically pulls in compute resources to query data that resides there.
Azure SQL Data Warehouse is fundamentally broken down into two types of nodes: one control node and multiple compute nodes.
Control nodes are responsible for creating a query execution plan, and break that query into parallel phases to be executed by each of the compute nodes. In addition to query optimization and distribution, the control node also store query metadata.
The multiple compute nodes each contain an instance of SQL Database and are responsible for processing the data stored locally on their individual disks. Once the intermediate results for an individual compute node are processed, the compute nodes then return those result to the control node to be aggregate.
Azure SQL Data Warehouse queries data stored on Azure Blob Storage, which is Microsoft’s massive unstructured storage repository. Fundamentally there are two ways to query data stored on Azure Storage:
Types of data
Azure SQL Data Warehouse can work with relational data as well as non-relational data via Polybase, a storage layer for both relational and HDFS data.
Range of row (quantity)
Azure SQL Data Warehouse with Azure Blob Storage can handle petabytes of data. With compatibility with Azure Data Lake, Azure SQL Data Warehouse can query exabytes of data.
Azure SQL Data Warehouse management & implementation
Implementing Azure SQL Data Warehouse
Its very easy to take Azure SQL Data Warehouse for a test drive to do some quick benchmarking. Before starting, it’s necessary to have both an Azure Account and an Azure SQL Server.
Microsoft’s documentation has a great step-by-step guide for getting started with Azure SQL Data Warehouse.
Process for new data
Microsoft offers a few options to ingest data into Azure DW. Fast, compute-scalable data loads come from Azure’s Polybase. More traditional options such as Microsoft’s Bulk Copy Process (BCP) and SQL Bulk Load can be leveraged and are scalable on threads only, not on DWUs (Azure’s compute units). Lastly, if one is already invested in a fuller Microsoft stack, SSIS can be used as a part of the ETL pipeline into Azure DW.
Since Azure SQL DW is a cloud-based solution, users can take advantage of the low-maintenance that accompanies similar managed service solutions. However, Azure is particularly unique in that it provides as much customization as necessary for users.
Data Warehouse Units (DWUs)
Compute power in Azure SQL Data Warehouse is abstracted as Data Warehouse Units (DWUs). Users can dynamically add more compute power to their instance by increasing the number of DWUs through a couple different means.
DWUs allows users to scale up and down resources within the data warehouse very quickly and easily.
Scaling SQL data warehouse elastically
One of SQL Azure’s strengths is its flexibility in scaling the database out. Microsoft allows for both Vertical and Horizontal scaling of the data warehouse.
Users can vertically scale Azure SQL Data Warehouse by changing the service tier or placing the database is an elastic pool.
Users can horizontally scale Azure SQL Data Warehouse out by adding more data warehouse units. The Azure platform also provides an incredible amount of resources to customize autoscaling of the the data warehouse.