Jan 23, 2018
Amazon Redshift recently announced support for the LISTAGG DISTINCT function. If you’ve never used the LISTAGG function before, it’s a handy way to aggregate all of the values for a specified dimension into a single string, and as the name implies, LISTAGG DISTINCT ensures that the string contains only unique values rather than all values.
For example, let’s say that we want to do some targeted marketing for our local guitar store based on what brand(s) of guitar a customer owns. We’d like to produce a report that shows a customer along with each unique guitar brand that they own. Below is a sample of our customer data:
customers
Customer ID | First Name | Last Name | City | State | Country |
---|---|---|---|---|---|
1 | Robbie | Robertson | Toronto | Ontario | Canada |
2 | Link | Wray | Dunn | North Carolina | USA |
3 | Rickey | Medlocke | Jacksonville | Florida | USA |
customer_gear
Customer ID | Brand | Model |
---|---|---|
2 | Danelectro | Longhorn |
2 | Gibson | SG |
2 | Gibson | Les Paul |
1 | Fender | Stratocaster |
1 | Fender | Telecaster |
1 | Martin | D-28 |
3 | Gibson | Explorer |
3 | Gibson | Firebird |
3 | Gibson | Les Paul |
3 | Fender | Stratocaster |
To generate a list of each customer along with the guitar brands that they own, we could use the LISTAGG function to produce the following results:
SELECT
customers.first_name AS "customers.first_name",
customers.last_name AS "customers.last_name",
listagg(brand, ', ') within group (order by brand) AS "customer_gear.brand_list"
FROM public.customers AS customers
LEFT JOIN public.customer_gear AS customer_gear ON customers.customer_id = customer_gear.customer_id
GROUP BY 1,2
ORDER BY 2;
This report is helpful, but as you can see, brands may show up multiple times for the same customer. The introduction of the LISTAGG DISTINCT function allows us to clean up this list so that each brand is listed no more than once for each customer:
SELECT
customers.first_name AS "customers.first_name",
customers.last_name AS "customers.last_name",
LISTAGG(DISTINCT customer_gear.brand ,',') WITHIN GROUP (ORDER BY customer_gear.brand ) AS "customer_gear.brand_list"
FROM public.customers AS customers
LEFT JOIN public.customer_gear AS customer_gear ON customers.customer_id = customer_gear.customer_id
GROUP BY 1,2
ORDER BY 2;
This list, with duplicates removed, is exactly what we need for our marketing efforts.
LookML has a measure type designed specifically to take advantage of the LISTAGG DISTINCT aggregate function. Using the LookML list measure, it’s simple to add your own aggregated lists to reports. For the report above, we added the following measure to the customer_gear view file:
measure: brand_list {
type: list
list_field: brand
}
Looker generates all of the necessary SQL for you, so now all of your users can easily take advantage of this new functionality.
To read more about using LISTAGG DISTINCT with Redshift, check out AWS’s documentation here.