Accelerate your queries with Looker aggregate awareness

Kevin Marr & Joel McKelvey, Looker Product Management, Google Cloud & Looker Product Marketing Manager, Google Cloud

Aug 11, 2020

If you depend on data to make critical business decisions, ensuring that queries are fast and efficient is paramount.

To achieve high performance and efficient queries, most analytic tools use techniques that have one thing in common: limiting the size of the dataset queried. Desktop analytics tools accomplish this by using cubes or extracts of data. With other legacy tools, achieving faster queries means data must be heavily transformed, formatted, or aggregated in some way. Certain systems are designed to split data into subsets and index it, thus calling only on that subset when a query is run.

While these are great strategies to protect slower databases or legacy data tools from becoming overloaded, when handling most companies' data, these approaches are also flawed. Extracts of data can become stale and are difficult to govern. Building subsets of data and complex transformations are complicated, resource intensive, and error prone. And when dealing with extracted data, the granularity is eliminated and analysts lose their ability to drill down into the data for further insights.

Modern databases resolve many of the issues of performance due to their low cost and simple scalability. Using a modern database with an in-database analytics architecture, like Looker’s, can resolve the issues of stale and siloed data, eliminate extracts, and reduce the need for transformation-heavy data pipelines. But what about query performance? How fast and efficient can queries be when the entire dataset needs to be processed at query runtime?

Looker provides high performance and highly efficient queries by using aggregate awareness. Like prior approaches, aggregate awareness allows data teams to limit the amount of data queried — but avoids the pitfalls associated with data extracts that can be so painful to analyst teams.

With aggregate awareness, the Looker modeler can materialize aggregations in the user’s own database, then preferentially use that materialized view as the data against which to perform subsequent queries. These materialized views (aggregate tables) contain fewer rows of data, and therefore return results faster than a typical query. And because aggregate tables are based on live data, they can be defined to refresh automatically at whatever frequency might be required. When it comes to managing aggregates in other tools, it can be a painful undertaking. But with Looker, items like flow logic, refresh/rebuild, and referential integrity with the model are all managed for you automatically — an approach that is unique to Looker

It’s also important to note that aggregate awareness does not limit queries to the data in the aggregate table. Looker can UNION fresh data into your aggregate table automatically, using the aggregate where possible and filling in any gaps with unaggregated data as necessary. For example, you may have an aggregate that includes sales data for the previous month. This aggregate will not include the latest data because it will have been built at the close of last month and will not include data for the current month. Looker can still use the aggregate table to accelerate a query that includes both the current and previous month’s data by UNIONing aggregate table data with the latest results. With this unified response, you get the performance and efficiency of an aggregate with the completeness of the full dataset.

Aggregate tables are defined using the aggregate_table parameter in LookML. Once defined, the aggregate table is saved to your database, much like a persistent derived table (PDT). You can automatically generate aggregates, making development as simple as copy-and-paste. Tables can also be defined manually. Below is a sample of LookML code with a defined table called sales_monthly, designed to aggregate a month of sales data.


explore: orders {
    label: "Sales Totals" 
    aggregate_table: sales_monthly {
        materialization: {
            datagroup_trigger: orders_datagroup
        } 
        query: {
            dimensions: [created_month] 
            measures: [order_items.total_sales]
            timezone: America/Los_Angeles
        }
    } 
    # other explore parameters
}

The ability to use aggregate awareness depends on the database dialect your Looker connection is using. You can find out the current dialects that support aggregate awareness in our documentation here.

For those using Looker in development mode, you can test what data is being queried and if your aggregate tables are being used by running queries on your Explore. This can help you design aggregates that will most enhance common queries and reduce query time and cost.

Learn more about how aggregate awareness can improve your own query performance and reduce the number of rows your users query each time with our Looker documentation.

Next Previous

Subscribe for the latest posts