When I was in 5th grade, I discovered compound interest. I remember telling my friends: "You can put money in the bank and it will earn interest. If you leave it alone, by the time you're 100 years old, you'll be a millionaire!" I was super-excited by the concept. Compound interest! Put money in the bank, never spend it, and watch it grow. My friends thought I was crazy. Never spend it? Really?
Running a business, I see that over-focusing on Daily Active Users (DAU) is maybe also a little crazy. If my user base grows 3 percent a month consistently over several years, I can certainly call that success. But watching my user base grow is like watching my money grow. I don't actually get anything out of it. It doesn't tell me anything I can use to grow the business in the right way. In fact, measuring DAU to the exclusion of other measures can actually have adverse effects on the business.
Good users differ subtly among business types. In e-commerce companies, a good customer visits often, buys a lot, and tells their friends. In a game company, a good user loves your game, plays it many hours every day, and tells their friends.
Lost customers are similar. In e-commerce companies, a lost customer visits your site and never purchases, or has purchased once and never came back. Or worse: has purchased once, returned the goods, and then tells all their friends they didn't like your stuff. In a gaming company, a lost customer opened your game and never played it.
The problem with Daily Active Users is that you don't know the ratio between your good customers and your lost customers. Actually, you don't know anything about the customers, and over-focusing on Daily Active User growth won't tell you soon enough that your new users are quickly turning into lost customers.
Growth does matter, so you need to measure it. But you need to measure more. If you're collecting event data, you have the opportunity to convert that data into insights about your users.
The first simple step is to sessionize your event data to build a user/days table. To do so, I'll assume your event data is in a SQL table—a reasonable assumption, since you can easily store multi-billion row tables in a columnar database like Redshift. (The same technique will work in MapReduce too.)
For each user/day, we'll measure the date, the count of anything important your user might have done, and most importantly a measure of user engagement. The table is rolled up so that for each day a user visited your site, there's a record. In this case, our sessions are fixed at one day (and GMT days, at that).
Let's take a look at a simple summary query that does this:
CREATE TABLE daily_usage SELECT DATE(event.created_at) AS "created_date" , event.user_id as user_id , COUNT(*) AS number_of_events , COUNT(CASE WHEN event.name = 'activity1' THEN 1 ELSE NULL END) as activity1 , COUNT(CASE WHEN event.name = 'activity2' THEN 1 ELSE NULL END) as activity2 , COUNT(CASE WHEN event.uri LIKE '/models/fun%' THEN 1 ELSE NULL END) as fun_activity , COUNT(DISTINCT FLOOR(UNIX_TIMESTAMP(event.created_at)/(60*5)) *5 AS approximate_usage_in_minutes FROM event GROUP BY 1,2
Our events table has a time, user_id, type, and uri. The rollup by activity is pretty obvious, but the tricky and interesting thing here is measuring engagement.
Time on site is difficult to compute. The temptation is to find the first event and the last event, and take the difference. The problem with this approach is that the density of events matters. Especially the density over time. An activity will often create a flurry of events, followed by some idle time, followed by a subsequent flurry of events.
The code above carves up the day into five-minute blocks. It's similar to the one we use here at Looker. We've seen five-minute gaps in the event log, but we know the user is still engaged, so we decided to capture our users' engagement in five-minute blocks. But you can specify blocks of any length. For a gaming company, you might pick one minute. The time interval should approximate attention for your particular site or app.
The act of using an app or website (visiting pages, hitting URLs, etc.) generates events in the events table. If a user had any event during the time block, whether a flurry or a single event, we can assume they're using the product. Count up the blocks where the user had events and you've calculated approximate engagement. Is it perfect? No, but it's pretty darn good.
The code below is the code we use to block time at Looker (in MySQL).
COUNT(DISTINCT FLOOR(UNIX_TIMESTAMP(event.created_at)/(60*5)) *5 AS approximate_usage_in_minutes
This simple calculation is surprisingly useful.
These questions are easily answered once the data has been rolled up by characterizing sessions by levels of engagement. And, of course, you can easily measure Daily Active Users too...
Once you have the daily_usage (users/day) table, you can roll it up further by user. The code below is the code we use to generate user facts at Looker (in MySQL).
CREATE TABLE user_usage_facts SELECT user_id , SUM(approximate_usage_in_minutes) as lifetime_usage , COUNT(DISTINCT usage_date) as days_of_product_use , MIN(usage_date) as first_used_date , MAX(usage_date) as last_used_date FROM daily_usage GROUP BY 1
Joining this table back into the usage table, you can do all kinds of cohort analysis. For example, you can show usage by month pivoted by user signup month. Bucketing users by lifetime usage, you can see if big lifetime spenders spend more or less per transaction over time.
When you can characterize your users and measure their engagement, you have what you need to build an audience. Daily Active Users can tell you if you're rich yet, but the path to getting there, the path to building a business, will be based on measuring engagement and understanding your users.