An analyst’s guide to data virtualization
Sep 19, 2018
Imagine that everything you wrote had to be written on a typewriter. Any typos meant getting out the Wite-Out, and any larger edits meant retyping the whole page. Now compare that to our reality today, where we have word processors that allow us to edit and update our work instantly. By virtualizing the process of typing, we unlocked much more efficiency and solved huge editing and cleaning pain points. What word processors have done for the written word, data virtualization does for the world of data.
In Why Data Virtualization is an Analytics Game Changer, we provide an introduction to data virtualization, share the key pains it alleviates for data teams and analysts, and take a look at the data virtualization landscape. While the white paper takes a deep dive into several real world examples where data virtualization alleviates problems, this post aims to focus on where data virtualization fits into your technology stack and how data flows through it.
Where data virtualization fits in
Data goes on a journey as it passes through your tech stack. Throughout this journey, data is joined and transformed in different ways. One way data virtualization can help you is to virtualize the data before you send it to your warehouse. This opens up the power of Virtual Events, in which you can define, erase, and update events retroactively without touching your data and codebase. For example, you may add a new campaign landing page to your website to drive new signups. With Virtual Events, you can define those signups to be a “campaign signup” event a week or two later to see all the data for that event. If the campaign becomes less important in later months and you want to make those campaign-specific signups part of “all signups” instead, you can go in and redefine that “campaign signup” event to be a general “signup” event, without any lost data or impenetrable naming conventions.
After you collect and virtualize behavioral data on your website, you can push that data into your data warehouse. Then in your business intelligence tool, you can virtualize your customer behavioral data, along with the rest of the data you’ve replicated in your warehouse. With modeled data in your warehouse, you can generate more powerful visualizations, use it for performing ad hoc analysis, feed it to advanced learning tools, and get powerful action and data out of the warehouse.
These types of virtualization work extremely well in tandem. Virtualization across different layers quite often complement each other. For example, at Heap we see some of our customers:
- Use Heap for collecting and virtualizing behavioral data on their website
- Push Heap data downstream into their data warehouse where it sits alongside other sources of data
- Use SQL Views to further model that data in their warehouse for specific analytical applications
- Use Looker as a visualization and insights layer, and use LookML to model their warehouse data to make it business-focused for use within Looker and other applications
Data virtualization is a technological trend that every analyst should be aware of. There are many benefits to virtualizing data, and it can make a big impact to an analyst’s job at several different parts of the data flow within their tech stack.
To get more in depth with data virtualization and why it’s an analytics game changer, check out Heap’s whitepaper.