What do data people do?
Mar 18, 2014
One of today's business tragedies is when a company's smart data people spend their time pulling data to answer other people's questions, rather than on research and data science. Often this is the result of a tools gap: Until recently, most business intelligence tools have focused on "reporting" rather than allowing users to explore the data on their own (a harder problem to solve). At worst, the data team can become a cost center on the defensive.
Modern data teams do four things with the data they collect:
- Prepare it (data warehousing)
- Answer anticipated questions (reporting)
- Answer unanticipated questions (exploration)
- Ask their own questions (research)
Each activity has its own experts, sub-industries, tools and methodologies.
1) Preparing data (data warehousing)
A well-rounded technical person understands the architecture around relational databases and how data flows in and out. As a company matures, teams specialize on things like schema design, data warehousing, and ETL.
Tools: Programming languages (Python, Ruby, Java, Scala), shell scripts, SQL, NoSQL, Informatica, Snowplow, Looker, AWS EMR
2) Answering anticipated questions (reporting)
The Head of Data Engineering at a Fortune 500 company recently told me, "I have a team of 80 people whose job is to make sure a few numbers are in the right places at the right times, every day, and that they are correct." Your reporting shows that you have a pulse and how many widgets you sold today.
A lot of companies aiming to solve this problem have sprung up in the last 20+ years, and the automated business reporting industry is getting saturated.
Reporting tools: SAP Crystal Reports, Microsoft Reporting Services, Cognos BI, SAS, Looker, Tableau, MicroStrategy
3) Answering unanticipated questions (exploration)
An unanticipated question is a question you can guess someone might ask, like: "How many [items] did [users in English-speaking countries] buy in the past [x] Thursdays?" But you can't build a report for it because there are too many possible configurations for framing the question.
In fact, most of these questions can't be anticipated. But we can provide the grammar so that someone else can write the words (ask their own questions) — instead of pulling raw data to answer a question that hasn't yet been formulated.
Tools: Raw SQL, Excel / Google Docs, Looker
4) Research (data science)
This is where smart data people want to spend their time: finding correlations between purchasing patterns, productizing product recommendation engines, doing advanced stepwise clickstream analyses, fighting fraud.
Big companies with large data teams often have the luxury of allowing some people to focus on research. But too often, data people are bogged down with requests for data pulls to answer anticipated and unanticipated questions.
Tools: programming languages (especially Python/Pandas, SQL, R, Scala, Java, Julia), tools like SAS, MatLab, Looker
Everywhere we look, from Silicon Valley startups to global corporations, companies are employing an army of overqualified report-builders who want to be set free. It is an enormous waste and an enormous opportunity.
By taking advantage of more efficient ways to get decision-makers vital information, data teams can spend their time and effort where they can make greater impact: hunting for the hidden patterns that will grow their company's top and bottom lines.
Looker is a modern approach to BI that streamlines the path through all four activities: preparing your data via persistent derived tables; LookML modeling and raw SQL iteration; reporting and exploration via our web app; and even data science via our API and SDKs for toolsets such as R, Python, and Ruby.