Looker Blog : Data Matters

The Reusability Paradigm of LookML

Joshua Moskovitz, Analyst

Apr 10, 2014

Quick Reiteration of Metric Calculations for Powerful Data Exploration

At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are often starving for data, but are forced to make important decisions without access to data that could guide them in the right direction.

Looker addresses both of these problems with a YAML-based modeling language called LookML. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model. That means you can unleash them on your business users to manipulate, iterate, and transform in any way they see fit.

A key difference of LookML is that, unlike older approaches, it combines modeling, transformations, and derivations at the same layer (late-binding modeling). This allows vast amounts of data to be captured in relatively inexpensive databases (mirrored or copied), and then derivations and transformations occur much closer to, or at, query time. The traditional approach is to transform the data as it’s loaded (ETL), whereas LookML allows for transform and derivation on demand (ELT). The result is a very agile data environment where user questions can change and the data environment can better keep up.

E-commerce — Starting with Total Cost of Order

Let’s take a look at a simple e-commerce example. I’ll show how we can create a dimension, the Total Cost of Order, which can be reused and built on throughout a single LookML model.

First, a quick primer on a typical e-commerce data model, which will help us answer questions about the buying and selling of items online. In our example, we’ll work with a subset of tables: Orders, Order Items, and Inventory Items. As a business that tracks Orders, it’s probably important for us to determine the distribution of our customers’ orders based on cost. In our current Orders table, we don’t have a field that tells us the cost of an order, because each order contains multiple items of varying costs. So we need to calculate a cost of an order by summing over the sale prices of the items in the order.

Orders table

id created_at user_id
1 2014-04-01 5656
2 2014-04-01 7269

Order Items  table

id created_at order_id inventory_item_id sale_price
1 2014-04-01 5656 3 $12
2 2014-04-03 7263 5 $45

Inventory Items  table

id created_at cost sold_at product_id
3 2014-04-01 $8.50 2014-04-05 5
6 2014-04-02 $24.00 2014-04-04 7

Suppose we want to calculate a new dimension for Orders that will determine the Total Cost of an Order. In this case, the field is not stored in our database, but can be calculated from the sale price of order items in the order. The technique we will use here is a correlated subquery, a simple and easy approach. (Looker supports more complicated and powerful mechanisms via derived tables for databases that do not support this feature or when performance becomes a problem. We'll cover derived tables in a future post.)

For any given order, the SQL to calculate the Total Cost of Order is:

SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id

We sum over the sale price associated for each item in a given order, where the Order Items Order_Id field matches with the primary key in the Orders table. In Looker, we’d want to create this dimension in the Orders view, since it’s an attribute of an Order.

- view: Order
  fields:
    - dimension: total_amount_of_order_usd
      type: number
      decimals: 2
      sql:
          (SELECT SUM(order_items.sale_price)
          FROM order_items
          WHERE order_items.order_id = orders.id) 

Now let's see our dimension in action:

determining order profit

Tiering Total Cost of Order

Looks good, but we’ve got a wide range of order amounts, so it probably makes sense to bucket these values across set intervals. Normally, if we were writing SQL, we’d have to make a CASE WHEN statement for each discrete bucket. Conveniently, LookML has a tier function, so let’s use that.

- dimension: total_amount_of_order_usd_tier
  type: tier
  sql: ${total_amount_of_order_usd}
  tiers: [0,10,5,150,500,1000]

Notice that we can reference our existing Total Amount of Order dimension in the ‘sql:’ parameter of the measure. Now when we use the tier, we bucket Orders into their respective tiers:

determining order profit

Determining Order Profit

What if we wanted to know more about our Order, maybe the profit? To determine the profit of an order, we will need a Total Cost of Order dimension.

- dimension: total_cost_of_order
  type: number
  decimals: 2
  sql:|
        (SELECT SUM(inventory_items.cost)
        FROM order_items
        LEFT JOIN inventory_items ON order_items.inventory_item_id = 
              inventory_items.id
        WHERE order_items.order_id = orders.id)

When using this Order Profit, Looker will substitute the existing business logic for both the Total Order Amount and Total Cost of Order. Let’s run a new query using the Order Profit dimension.

determining order profit

Calculating Profit Per User

Another valuable metric for our e-commerce business may be Profit Per User. In Looker we can reference dimensions or measures from other views. In this case, to determine the Profit Per User, we’ll reference our Count measure from the Users view as the denominator of a measure in the Orders view, where the numerator is our Order Profit dimension. We use the Count measure from the Users view to scope the count with ‘users.’

- measure: profit_per_user
  type: number
  decimals: 2
  sql: 100.0 *${order_profit}/NULLIF(${users.count},0)

Now we can see how our Profit Per User varies by every Order dimension. In this case, we see how it varies by Order Date:

calculating profit per user

Creating an Average Total Order Amount Measure

What if we wanted a measure that computes the Average Total Order Amount whenever we group by a dimension in Looker? For instance, we might group by average total order amount in a certain Month, by orders from users in a certain State, or by the Lifetime Number of Orders of a user. When we create a measure in Looker, we can reuse it in many different contexts.

Let's first build our Average Total Cost of Order measure.

- measure: average_total_amount_of_order_usd
  type: average
  sql: ${total_amount_of_order_usd}
  decimals: 2

Again, we can reference our already existing Total Order Amount dimension and set the dimension type as an average. Now when we use this dimension, it will aggregate over all total order amounts within that group, calculating the average.

Here we see how our Average Total Order Amount varies by the Lifetime Number Orders of Customers and by the Week the order was created:

creating average order measure

Creating Conditional Measures — First Purchase and Return Shopping Revenue

We can also create measures that calculate Total Order Amounts based on conditions of the order, such as whether it was a customer’s first purchase or if it’s made by a return customer. This way, we can determine how much revenue was generated from new or returning customers. It’s likely we have discrete teams focused on new user acquisition and on current user retention, so it may be important we break these revenues apart.

- measure: total_first_purchase_revenue
  type: sum
  sql: ${total_amount_of_order_usd}
  decimals: 2
  filters:
     is_first_purchase: yes
- measure: total_returning_shopper_revenue
  type: sum
  sql: ${total_amount_of_order_usd}
  decimals: 2
  filters:
     is_first_purchase: no

Again, both of these measures — Total First Purchase Revenue and Total Returning Shopper Revenue — take advantage of our existing Total Amount of Order dimensions. We can now directly compare both types of revenue.

separating first and returning revenue

Putting It All Together

Given the dimensions and measures we’ve just created, let’s build a report that shows us Total Returning Shopping Revenue, Total First Purchase Revenue, Average Total Amount of Order, and Average Order Profit, broken out by the Total Amount of Order tiered and the Week in which the order was created.

putting it all together

There we go.

Now let’s compare this with the SQL we’d have to write to generate such a result set.

SELECT COALESCE (CASE
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
  < 0 THEN 'T00 (-inf,0)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 0 AND (SELECT SUM(order_items.sale_price)
​FROM order_items
WHERE order_items.order_id = orders.id)
 < 10 THEN 'T01 [0,10)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 10 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 50 THEN 'T02 [10,50)'
WHEN SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 50 AND (SELECT SUM(order_items.sale_price)-----
FROM order_items
WHERE order_items.order_id = orders.id)
 < 150 THEN 'T03 [50,150)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 150 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 500 THEN 'T04 [150,500)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 500 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 1000 THEN 'T05 [500,1000)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 1000 THEN 'T06 [1000,inf)' 
ELSE 'TXX Undefined'
END,'') AS 
"_p1",COALESCE(DATE_ADD(DATE(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles')),
    INTERVAL
(0-(DAYOFWEEK(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles'))+5)%7) DAY),'')
    AS "_g1", DATE_ADD(DATE(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles')),
    INTERVAL
(0-(DAYOFWEEK(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles'))+5)%7) DAY) 
    AS `orders.created_week`,CASE
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 0 THEN 'T00 (-inf,0)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 0 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 10 THEN 'T01 [0,10)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 10 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 50 THEN 'T02 [10,50)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 50 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 150 THEN 'T03 [50,150)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 150 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 500 THEN 'T04 [150,500)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 500 AND (SELECT SUM(order_items.sale_pr
FROM order_items
WHERE order_items.order_id = orders.id)
 < 1000 THEN 'T05 [500,1000)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 1000 THEN 'T06 [1000,inf)'
ELSE 'TXX Undefined' 
END AS 'orders.total_amount_of_order_usd_tier',
      AVG((SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
) AS 'orders.average_total_amount_of_order_usd',
     SUM (CASE WHEN ((SELECT COUNT(*)
FROM orders o
WHERE o.id < orders.id
AND o.user_id=orders.user_id) + 1

) = 1 THEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
  ELSE NULL END) AS 'orders.total_first_purchase_revenue',
     SUM (CASE WHEN NOT COALESCE((( SELECT COUNT(*)
FROM orders o
WHERE o.id < orders.id
AND o.user_id=orders.user_id) + 1

) = 1, FALSE) THEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
  ELSE NULL END) AS 'orders.total_returning_shopper_revenue',
    AVG(((SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 - (SELECT SUM(inventory_items.cost)
FROM order_items
LEFT JOIN inventory_items ON order_items.inventory_item_id = inventory_items.id
WHERE order_items.order_id = orders.id)
)) AS `orders.average_order_profit`
FROM orders
WHERE
    orders.created_at BETWEEN (CONVERT_TZ(DATE_ADD(CURDATE(),INTERVAL -29 day),
    'America/Los_Angeles','UTC')) AND (CONVERT_TZ(DATE_ADD(DATE_ADD(DATE_ADD
    (CURDATE(),INTERVAL -29 day),INTERVAL 30 day),INTERVAL -1 second), 
    'America/Los_Angeles','UTC'))
GROUP BY 1,2
ORDER BY CASE
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 0 THEN 'T00 (-inf,0)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 0 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 10 THEN 'T01 [0,10)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 10 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 50 THEN 'T02 [10,50)' 
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 50 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 150 THEN 'T03 [50,150)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 150 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 500 THEN 'T04 [150,500)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 500 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 < 1000 THEN 'T05 [500,1000)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
 >= 1000 THEN 'T06 [1000,inf)' 
ELSE 'TXX Undefined'
END
LIMIT 500


Maybe this makes sense to write one time, but what if we want to look at this by a Customer's state instead of by order week?

one time

Or maybe by Lifetime Number of Purchases by a customer, tiered?

lifetime

As you can see, all these reports can be generated, altered, and updated—without the need to rewrite any SQL. In LookML, we abstract the essential business logic once, then reference it within other dimensions and measures—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. If a business user wants a new tier, just add it to the dimension. If they want to determine revenue from customers with more than 10 purchases, just create a new measure that sums total order amount and filters on customers with more than 10 purchases. Small updates are quick and can be made immediately available to end users. That frees you up to define the new metrics that will take your business to the next level.

Next Previous

Subscribe for the Latest Posts