Legacy BI - artifacts of the old data world
Dec 16, 2013
At Looker, we have the privilege of talking to lots of fast-moving companies that are trying to get value out of large data sets (yes, sometimes even “big data”). We also talk to more traditional businesses that have BI solutions in place, but are having difficulty applying these traditional approaches to their more complex (and large) data problems.
Traditional BI ways are broken. BI was architected in the days of big expensive databases that ran on very expensive machines. These big and expensive databases needed to be optimized at every level, because... well... they were expensive. If you didn’t optimize, things broke or you had to buy more. To run reports on data, or to do any kind of analysis, you generally pulled small pieces of aggregate (summary) data out of the big expensive database and moved it into some kind of cube or BI tool datastore. To oversimplify, people architected an entire stack as a workaround to keep their Oracle spend manageable.
The problem with the BI workaround is that it puts handcuffs on data discovery. If you can only operate on aggregates, you can't explore to detail. And part of creating aggregates is presupposing what questions people are going to ask—as if they are cast in stone. So, traditional BI was built around these two problems: (1) always build to explicitly what people are going to ask, and (2) if they ask ad hoc questions (like true data discovery), you need to start over.
What's really cool is that database vendors and open source projects gave Oracle a run for their money. They built much cheaper and much faster analytic-oriented databases that allowed organizations to capture and store gigantic amounts of data. They also let you transform this data inside the database, not when it’s ingested. For the purposes of this blog, I’m not even talking about Hadoop... I’m talking about the more vanilla MPP, in-memory, columnar, and analytic mirrors that were made lightning-fast by cheap multi-core servers. A new world has opened up and we don’t have to be shackled by slow databases any more.
Are traditional BI tools all screaming fast now too? Unfortunately, no.
"You can’t put lipstick on a pig" comes to mind. In technology, we use this term all the time, and it’s why venture-backed tech companies exist. Legacy architectures can’t be tuned when there have been fundamental shifts in underlying infrastructure. It requires starting over—throwing out everything we knew to be true. If you’re going to rip out the middle tier (BI data engines and cubing), re-architect the stack on a new paradigm, and take a different approach to data, it’s really hard to do it on a platform that was built in a different age. If your BI solution wasn’t built to model and transform data as it sits in the database, create aggregates and derivations where the data sits, and generally suppose questions will be ad hoc instead of predetermined, it will be hard to operate in this new world. At Looker, we love this new world.