Announcing support for the Databricks SQL launch
Nov 12, 2020
We’re excited to support the launch of Databricks SQL. Databricks provides a Unified Data Analytics Platform to prepare and analyze data for both ML/AI and business intelligence, while Looker provides the platform for analysts to access and act on this data to make better business decisions. The launch of the Databricks SQL will improve ease of use, scalability, and performance, making it even easier and faster to get value out of data with Databricks and Looker.
The database dilemma
Looker’s in-database architecture relies on the underlying data store for both the data itself as well as performance. Since our customers use Looker to run critical functions across all aspects of their businesses, various types of data stored in different locations are needed to provide a comprehensive analytics experience. Aspects like data freshness, time to return results, and advanced analytics functionality often vary for use cases and data sources across the same Looker deployment. Oftentimes we find our customers leveraging multiple data technologies in order to meet all their needs, such as:
- A data lake for unstructured/semi-structured data
- A data warehouse for internal analytics
- Niche time series/in-memory databases for real time operational use cases
It is more challenging to create a reliable and complete view of the business when the data is physically separated, rather than centralized in one location. Additionally, it can be a burden to manage the complexities of so many disparate systems.
The shift towards lakehouse architecture
Data lakes are a natural place to start in data management since they can store a vast amount of raw data in its native format. Data can be stored as-is, without structuring it into the more rigid requirements necessary for data warehouses and transactional data stores. Looker has supported data lake use cases with Databricks via our Spark support since 2016. As Looker natively leaves data where it is stored in the data lake and leverages Spark SQL for processing, there is no issue with data volume and complexity for analytics. However, data reliability and performance requirements often warranted leveraging an additional datastore.
The Databricks launch of Delta Lake was the first step in blurring the lines between data lake and data warehouse. The benefits of a data lake, like support for unstructured data, were combined with the advantages of a data warehouse, like schema enforcement and governance, to better serve both use cases from the same service. Further, Delta Lake supports end-to-end streaming applications which would have previously been solved by niche in-memory and time series databases. Looker’s Spark support takes advantage of Delta Lake’s unique architecture, and allows you to simplify your data architecture without sacrificing on any of your analytics needs.
The promise of the Lakehouse Architecture on Databricks is to bring reliability, quality, and performance to data lakes. This means you can rely on one unified platform for every use case, including BI & reporting, machine learning, and streaming applications. The Databricks launch of Delta Engine furthered this vision through accelerating query performance. An improved query optimizer, caching layer, and optimized execution engine (Photon) make reliable data from Delta Lake available to users faster. Since the Lakehouse has access to the most recent and complete datasets, it’s an ideal source of data for Looker, given advancements in Data Warehouse-like support for schemas, performance, and more.
Delta Engine and Databricks SQL
Databricks’ SQL service will provide an even better analytics experience for Looker and Databricks customers. Specifically, we expect improvements across the following areas:
- Ease of use: Simplified infrastructure administration based on “t-shirt size” compute (pre-configured compute sizes; such as small, medium, large, and so on) should make it easier to get up and running optimally for analytics without extensive configuration.
- Performance: Increased performance from Photon, the new Databricks execution engine optimized for modern structured and semi-structured workloads, will be directly leveraged by Looker.
- Scalability: Enhanced JDBC connectivity will result in lower latency and higher throughput, meaning you can analyze large volumes of data even faster.
Databricks SQL is now fully supported in Looker version 21.6.
To learn more about how different data technologies (such as data lakes and data warehouses) work together and the advantages of combining their capabilities, check out this Gartner article: The Best Ways to Organize Your Data Structures.
Ready to get started with Looker and Databricks? You can use this documentation to connect your existing Looker instance to Databricks, or sign up for a free Looker trial.