Label Insight modernizes their data stack to deliver efficiencies across the organization

Andrew Watt, Senior Technology Operations Analyst, Label Insight

Sep 16, 2020

Label Insight is in the business of delivering data, so we have to make sure we’re doing so correctly and efficiently. Our specialty is analyzing the data associated with packaged food items sold via different retail channels, so we essentially operate at the nexus of consumers, retailers, and brands.

Today’s savvy shoppers are looking for products that provide transparency about their ingredients. They want to know information like whether a product aligns with a particular diet, such as paleo or keto, whether the packaging is sustainable, or whether it contains allergens. These insights aren’t always something that can be determined just from reading the information on a product package. To meet the demand for transparency, we provide product attribute data to leading brands and grocery retailers to help them drive eCommerce and in-store sales.

As time has gone on and our volume of data began to mount, we realized that our legacy data stack was getting increasingly complex, costly, and restrictive. We needed to modernize our data platform so that we could do more and do it faster.

How we use data at Label Insight

Label Insight helps retailers and brands stay on top of trends and market demand by analyzing the packaging and product labeling of different products. To do this, we extract the ingredient and nutrient data and run it through our proprietary taxonomies to derive additional attributes — which can amount to more than 22,000 attributes per product. We then add this information to the database, apply data analytics to gain insights, and create reports that help customers manage how they market and develop their products. From here, our customers use this information to inform decisions like repackaging existing products or creating new products that are in line with the latest dietary trends.

Siloed data + multiple sources = data complexity

Between our on-premises legacy business intelligence (BI) system, numerous data silos, and cumbersome processes, it became increasingly difficult to quickly extract helpful insights from the data. Accessing data was a difficult task for the company as a whole, not to mention doing so was costly and complicated, requiring specialized people to manage the system.

In addition to our customers’ sales data, which we merge with our internal attribute analysis, we also bring in third-party external data - like search data - to identify keywords that could influence new product launches. And since all of Label Insight’s applications are event-based, these data logs were adding to our already huge databases.

Throughout our organization, different teams were managing their own databases to meet the use cases of their stakeholders. This created a number of issues from an analytics perspective. Most notably, it resulted in a large number of data stores—internal and external—with different user permissions and inaccessible event stream data.

In order to get the insights and reporting we needed to provide value to our customers, we knew we needed engineering assistance, as modeling this data was going to be complicated.

Zeroing in on needs and categorizing use cases

After further evaluation of our current data systems and processes, we reached a point where we knew we had to make a major change. Our executive team set up a BI taskforce to identify our key needs, and in just a week’s time, the task force surveyed employees across the company and came up with 100 different data use cases, which we grouped into these key areas:

  • Productivity and QA reporting
    Using the legacy system, it was taking data analyst teams one full day per week, every single week, to pull business-critical reports for our customers. At any given time, there were more than 100 stakeholders actively awaiting these reports.

  • Taxonomy and attribute analysis
    To provide derived insights into consumer packaged products, we need to maintain an intricate taxonomy and complex database for all the different rules we have. This is done by Label Insight’s team of in-house registered dieticians who, with some exceptions, don’t have relational database or SQL experience. The solution to this problem was assigning non-engineers to edit the databases, which created a huge quality assurance concern. We knew we didn’t want numerous people editing databases, as this practice would potentially disrupt the integrity of our data. Working with these large data sets, it also became difficult and time-consuming for our dieticians to analyze and identify new dietary trends, which hindered new product development for our customers. We wanted to make this analysis process easier so that we could release more attributes and answer customer questions more rapidly than the current system allowed.

  • Executive reporting and KPI tracking
    Our executive team wanted a scalable way to monitor key performance indicators (KPIs) and a simple way to pull repeatable reports on their own. When preparing a presentation for the board, executives wanted to be able to get the metrics they needed without having to interrupt the day-to-day workflow of their teams.

  • Cross-departmental task management
    Every team at Label Insight has their own preference on task management tools, and while that’s worked for us, connecting up all those tools proved to be a problem of its own. We knew we had to find a way to do this so that we could ensure all our teams are working in unison to deliver the answers to customers in a timely manner.

The result of legacy systems: A day in the life of an analyst

To illustrate the effects of our legacy system, let’s look at how Label Insight analysts had to go about preparing reports -- a process that would take roughly eight hours.

First, an engineer, who was charged with maintaining SQL queries, prepared the SQL results. The analyst would then go through standard data cleansing, merging, and formatting process. This highly repetitive process was often extremely slow, especially when tools were not optimized.

Next, the analyst would receive the raw report and do the required analysis, calculations, and visualizations to expose insights for the end-user.

Following this was the daunting QA period to check for errors. If the analyst discovered any errors, the report would be completely scrapped and everything would have to be redone, which would take about four hours. When a final report was finally prepared and cleared as error-free, it was stored and then delivered to the customer.

So while we had rich data sets, accessing them would often take one person an entire week of analysis. This process was not scalable, repeatable, or reliable, and as a result we missed out on taking advantage of new features. With users being able to access the SQL databases directly, engineers were also restricted in their ability to innovate or make improvements like migrating to more complex databases or moving to something more flexible to meet people’s needs.

Label Insight finds a better way

Today, our new data platform is built on three fully integrated pillars: Google BigQuery as our data warehouse, Fivetran as our automated data pipelines, and Looker for business intelligence.

The BigQuery cloud data warehouse

BigQuery has virtually infinite and scalable storage capacity and unrivalled performance. We can easily store more than 400 million events, and the ETL process with Fivetran is highly adaptable and simple to manage. Plus, because BigQuery’s usage fees are low and storing data is inexpensive, it’s highly cost-effective. When evaluating data warehouse offerings, our executive team found that the more we use BigQuery, the more we’d receive significant benefits and ROI for the company.

The Fivetran data pipeline

Thanks to Fivetran, we now have automated ingestion of 17 data sources, including Salesforce, HubSpot, Jira, and Zendesk. Fivetran feeds all our data into BigQuery, so we no longer have to ask ‘Where is the data stored?’ There are no more data silos, no more manual hunting for data, and no more needing to piece it all together.

Another big advantage of having Fivetran as our ETL layer is that it automatically adapts to changes in the source data, allowing us to add hundreds of events per minute with zero maintenance requirements, regardless of how the data is formatted. Plus, because Fivetran is part of our cloud-based solution, we can reach more users and deliver actionable insights to even more people. This has helped lower our costs, as we now only need a few individuals to help deliver insights and reports to our customers.

Dashboards and visualizations with Looker

We chose Looker for our BI layer largely because it allows us to go further than the SQL lens usually allows. With easy-to-set-up dashboards, reporting, and analytics, it enables anyone at Label Insight to work with data in an easy, democratized way. Since Looker is highly intuitive, we were able to percolate it across the company. And rather than needing to bring more people onto the BI team to maintain the platform, we established champions in each department who then empower their team to adopt the self-service model of data discovery and analysis. With Looker, we’ve been able to achieve the decentralized business model we were aiming for.

Implementing Looker has also allowed us to extend BigQuery by enabling governance and control, enabling us to make use of the high-quality data in BigQuery. This frees up our data team from having to constantly manage reporting requests, making the data process truly operational. Looker has also helped solve our internal communications issues. With Looker’s ability to integrate insights via embedded analytics into our existing applications like Slack, our teams are able to get consistent, accurate data in their favorite task management tools, enabling everyone to continue providing value to our customers.

Business outcomes exceed expectations

With our modern data stack in place, we’ve already seen some impressive business outcomes.

  • By saving 120 labor hours on reporting per week, we’ve seen an ROI of 200%. This has opened up time and resources for our teams to pursue new and exciting initiatives.

  • We’ve realized recurring savings amounting to $10,000/month with our data technology stack.

  • ETL automation has enabled our analysts to easily and quickly access data across our 17 different sources to get regular updates. And with Fivetran connectors syncing hourly, we’re alerted anytime something needs to be fixed.

  • Adoption of our data platform continues to grow across all of our departments. We now have ~60% user engagement on the platform, and with the help of our Looker superusers, we have our sights set on continuing to grow that number.

Modernizing our data technology stack has transformed our business in all the ways we were hoping for. We look forward to future innovations from Looker, Fivetran, and Google as we move forward on our data journey.

Previous

Subscribe for the latest posts