Looker Blog : Data Matters

Amazon Redshift announces support for LISTAGG DISTINCT aggregate function

Bruce Sandell, Senior Sales Engineer, Alliances

Jan 23, 2018

Amazon Redshift recently announced support for the LISTAGG DISTINCT function. If you’ve never used the LISTAGG function before, it’s a handy way to aggregate all of the values for a specified dimension into a single string, and as the name implies, LISTAGG DISTINCT ensures that the string contains only unique values rather than all values.

For example, let’s say that we want to do some targeted marketing for our local guitar store based on what brand(s) of guitar a customer owns. We’d like to produce a report that shows a customer along with each unique guitar brand that they own. Below is a sample of our customer data:

customers

Customer ID First Name Last Name City State Country
1 Robbie Robertson Toronto Ontario Canada
2 Link Wray Dunn North Carolina USA
3 Rickey Medlocke Jacksonville Florida USA


customer_gear

Customer ID Brand Model
2 Danelectro Longhorn
2 Gibson SG
2 Gibson Les Paul
1 Fender Stratocaster
1 Fender Telecaster
1 Martin D-28
3 Gibson Explorer
3 Gibson Firebird
3 Gibson Les Paul
3 Fender Stratocaster


To generate a list of each customer along with the guitar brands that they own, we could use the LISTAGG function to produce the following results:

SELECT 
    customers.first_name  AS "customers.first_name",
    customers.last_name  AS "customers.last_name",
    listagg(brand, ', ') within group (order by brand) AS "customer_gear.brand_list"
FROM public.customers  AS customers
LEFT JOIN public.customer_gear  AS customer_gear ON customers.customer_id = customer_gear.customer_id 
GROUP BY 1,2
ORDER BY 2;

listagg_distinct

This report is helpful, but as you can see, brands may show up multiple times for the same customer. The introduction of the LISTAGG DISTINCT function allows us to clean up this list so that each brand is listed no more than once for each customer:

SELECT 
    customers.first_name  AS "customers.first_name",
    customers.last_name  AS "customers.last_name",
    LISTAGG(DISTINCT customer_gear.brand ,',') WITHIN GROUP (ORDER BY customer_gear.brand ) AS "customer_gear.brand_list"
FROM public.customers  AS customers
LEFT JOIN public.customer_gear  AS customer_gear ON customers.customer_id = customer_gear.customer_id 
GROUP BY 1,2
ORDER BY 2;

listagg_distinct

This list, with duplicates removed, is exactly what we need for our marketing efforts.

Using LISTAGG DISTINCT with Looker

LookML has a measure type designed specifically to take advantage of the LISTAGG DISTINCT aggregate function. Using the LookML list measure, it’s simple to add your own aggregated lists to reports. For the report above, we added the following measure to the customer_gear view file:

  measure: brand_list {
    type: list
    list_field: brand
  }

Looker generates all of the necessary SQL for you, so now all of your users can easily take advantage of this new functionality.

To read more about using LISTAGG DISTINCT with Redshift, check out AWS’s documentation here.

Next Previous

Subscribe for the Latest Posts