Just-in-time data science. Rapid transformation and experimentation.
Feb 10, 2014
Experimentation leads to discovery. Data “science” is about experimenting with data. In the data “lab,” the experiment is a transformation. And it is key to discovery
Data captured in the wild, from a transactional system or event log, typically needs to be transformed before it’s useful for analytic purposes. With traditional ETL techniques, transforming and normalizing data as it moves into an analytic data store can be cumbersome and limits its future usability. A much more flexible approach is to bulk-load data into the analytic data store and then transform it at query (discovery) time, or very near query time. Looker is a highly adaptable environment for creating and deploying SQL-based data transformations—so that the results of your experiments are delivered within minutes, instead of days.
What’s the best way to think about your data?
Imagine you’re analyzing workers and the jobs they perform. Through event logging, you collect information about each transaction they provide, including start time and duration. You also capture an event that shows how many seconds a worker has been available since they last performed a job. Your analytics could include questions about worker availability, customer preferences, time spent on a job, workers with high/low ratings—all summarized by location, month, or any other roll-up that’s valuable to your business.
In order to answer the questions, the event datasets need to be transformed into a more useful state. For example, you can measure worker utilization, success rate, average time to complete tasks, and then compute facts such as work shifts, programs, customer or worker location. Getting these fact tables right will require a degree of experimentation. Looker makes this easy.
Transformations and experimentation are the key to discovery.
With LookML, Looker provides an experimentation framework. Fact tables can compute all kind of interesting things, like customer lifetime value, average purchase size, average number of days in inventory for each of your SKUs, and number of days until payback on rentals. Computations for these facts can start out simple, but you may later identify new contributing factors or data points that you want to build into your computations. Looker manages the creation and lifetime of your experimental fact tables, so you can iterate and test out various transform approaches.
You can start exploring and understanding your source data immediately through Looker. The faster the cycle time, the more likely you are to stumble over something interesting. Looker makes transformations so easy and so fast that you are much more likely to find the thing that matters. And get it exactly right.
With Looker’s development environment, you don't need to be a DBA or low-level programmer, or even to involve one. Your data analysts can effectively do the transform once the data has been loaded. The transform is performed in a reusable model, so the analyst only has to do this once, and every part of the transform can be re-used later. This further reduces cycle times (and lowers the cost), because you have fewer people involved and you don’t have to wait for multiple hand-offs. So your data consumers get answers when they need them.
The value of Looker to develop and test the transformation.
SQL’s design encourages the creation of monolithic queries. Some of them can get pretty monstrous. SQL doesn’t provide a way to encapsulate different ideas in the query’s logic in any reasonable way. Looker solves this problem.
LookML lets you write each part of the query separately, making it much easier to understand. And with reusable LookML definitions for each SQL component, you can build and test your transforms in a modular way—rather than construct and debug a huge SQL query all at once. Each of these parts is reusable in future queries. All of this is versioned (using git) and sandboxed (in Looker’s development experimentation environment), so experimentation is safe and controlled.
As you incrementally build your transform in LookML,you can use Looker to test and validate the results in real time. Is the transform useful? Does the data you’re discovering and reporting answer specific business questions and objectives? Simply join your transform with your other business data tables (which is very easy in Looker), and you can check the value of your results immediately.