Modeling conversion funnels in Looker, a MySQL subselect approach
Jul 7, 2014
The sales funnel employed by most businesses is used to visually describe the lifecycle of a customer from initial contact to closed sale. Examining these funnels to discover where sales are (or aren't) dropping is an important way to improve your business. You are also probably interested to see how that funnel is affected by all different facets of your business. How does the funnel vary by the month a user joined, their mobile operating system (do Android users convert better than iOS?), or lead source? In some cases you'll also want to examine the effect of product releases on your most important user funnels. Looker's modeling layer gives you the ability to quickly model, query, and analyze user progression through the funnels. Once created, it's easy to slice the funnel by other dimensions of your model, or compute average completion times and % fall off throughout the funnel.
In this post we will describe the process to model an event funnel in LookML. The high-level steps of this process include:
- Define the stages of the funnel
- Define the duration of stages in the funnel (the interval of time the funnel is evaluated upon)
- Create a LookML view, "funnel facts", based on a derived table
- Define the dimensions and measures to expose funnel facts for exploration in Looker
For the sake of this blog post assume all events in your DB are stored in one large table called "events", which contains all of the events for all of the visitors. The important columns:
- created_at: the time of the event
- user_id: if the visitor signs up, this is their user id
- visitor_id: a cookie that gets set for a visitor
- action: the name of the event
Part 1: Define all the steps within your funnel
In this example the 'action' is a step in the funnel. Let's assume that our funnel consists of 4 steps.
- a user lands on the product page, action = 'product_view'.
- a user can select a product type, action = 'select_product_type'.
- a user adds the chosen product to the cart, action = 'add_to_cart'.
- a user attempts a checkout, action = 'checkout'.
We want to produce a table that looks like the following example. For every visitor ID we'll include a column for each of our steps. If the step occurred we'll place the timestamp of the event in the column. If the step did not occur within our specified time interval we'll use NULL instead.
Part 2 - Define a subselect for each step in the funnel, grouped by visitor_id
Next, we write a subselect statement that defines each step in the funnel, grouped by visitor ID. We'll use these definitions later as we fill in the LookML to create a derived table for the funnel.
Optional: Determine user_id (if exists)
We can also include a final subselect to determine the user_id (if there is one) for a visitor_id, however this can slow down the performance of the query. Without it, the entire table can be built in a few seconds on a few million rows.
Part 3: Define JOIN syntax and determine interval of funnel
Next, we will determine the join logic for each step of the funnel, which we will use later in the LookML view file. Note that in this step we can configure the interval of the funnel the following way.
The join is ensuring that the visitor_ids are the same and that the step event, in this case step 3, was fired within the interval of time allotted for our funnel.
Part 4 - Create a derived table in LookML view file
Now we take the SQL subselect statements and JOIN logic from the last two steps to create a derived table in LookML to give us a view of our funnel. We start with our SQL transformation within a derived table LookML view file.
We now have our derived table in the correct format. "Note the last two parameters,indexes and sql_trigger_value.indexes sets the index on visitor_id.sql_trigger_value manages the persistence of this table in the looker_scratch schema." For details, check the LookML documentation for Persistent Derived Tables.
Part 5 - Create dimensions from the derived table for analysis in Looker
With our derived table we can now create LookML dimensions that reference this table. We define a dimension for each of the following:
- whether a visitor completed a step
- the time each step was completed (or NULL if it was not completed)
- the interval of time between each step and the first (in minutes)
Part 6 - Creating aggregation measures from the dimensions
With our dimensions created, we can create measures that reference them to compute the following:
- unique visitors
- how many visitors completed each step
- percent fall-off between each step
- the average time spent in each step of the funnel
Querying our funnel
With all of this in place we can now query our funnel (once we've exposed it as a base view). Let's see how our funnel is performing over time.
Various ways to extend this technique further
Dealing with massive elements tables
For cases in which you have a very large events table (hundreds of millions, if not billions of rows) theres an alternate strategy to first summarize all events to an individual time frame (day) and visitor (visitor_id).
In this case we'd transform the events table to the following form:
Time bounded funnels
It's not uncommon for e-commerce companies to look at both lifetime funnels (described above) and much shorter day funnels, especially when examining product changes. We might have a product funnel bounded by the day of a new product release. This would allow users to quickly look at the impact of different in-app changes on fall-off behavior (rather than looking at longer lagged lifetime metrics).
We can limit the funnel interval within the join itself by ensuring that each step was completed within 24 hours of the user completing the first step.
Filtering by User, Device, and OS dimension
To really harness the power of Looker, we could also include a join of the users view to this funnel, which would allow us to group by every dimension in the users view of our project.
Now our funnel analysis base view will include all of the dimensions from the User view. The funnel can be viewed through the context of any particular user dimension.
We could also utilize the fact that there is information about platform and OS associated with each event to see how the funnel performs across those facets. This would require also selecting those values from the events table when creating the funnel view, we can identify from which platform/device the event was fired.
Testing many variations of funnels
Imagine we had many hundreds of different event actions. We could use the methodology outlined above to create many different funnel variations, in order to analyze different sequences of visitors through those funnels.
We've got many different customers at Looker, all analyzing their funnels in interesting ways. The combination of Looker, LookML, and your own eventstream will empower you to easily and rapidly explore your company's key funnels.