Since COVID-19 started to spread around the world, data has been in the spotlight. With such a fast-moving virus, reliable data has been very hard to come by. Frontline healthcare workers have rightly been focused on saving lives, and haven’t had time to pause for data collection.
But many others have jumped in to help use data to understand, as best we can, what exactly is happening. Public health workers, journalists, academics, and even grassroots groups of volunteers have done the hard work of finding, collecting, cleaning, understanding, and maintaining critical datasets.
As we’ve worked with our customers, partners, and communities who are adapting to this new reality, the requests for data about what is happening have been relentless. In addition, ecommerce companies are using COVID data to adapt to radically-increased usage, restaurant chains are using it as they retool to focus on delivery, governmental agencies are using it to plan so they can mitigate the impacts of COVID on their citizens. And of course, organizations across the healthcare space — hospital systems and labs and insurers — are using it to understand how to prepare their business and save lives.
But with so many different entities collecting data, and each presenting the data in slightly different ways, the amount of work required to unify that data and make sense of it isn’t inconsequential. Even once you’ve got things working, building a data pipeline to keep the data fresh while monitoring for any schema or methodology changes is a big lift.
Our colleagues at the Google Cloud Public Datasets program have used their existing tools to centralize the data. They’re adding new datasets continuously, and have made queries against these datasets free on Google BigQuery.
When we were assessing where Looker could provide help, we focused not just on making data accessible, but on making it easy to utilize. This problem of making public datasets easy to integrate into existing data workflows is one we’ve run into before, and our solution was Data Blocks.
So in order to help our customers, public health authorities, and any other interested parties make sense of COVID-19 data, today we’re releasing a COVID-19 Data Block. The Block consists of LookML models, pre-built dashboards, and explores, along with links to data from the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE), the New York Times, the COVID Tracking Project, Definitive Healthcare, the Kaiser Family Foundation, and Italy’s Dipartimento della Protezione Civile.
The Block is free and can be loaded onto any Looker instance immediately from the Marketplace. The data that powers the Block is currently only available in BigQuery and will work on any Looker instance with an existing BigQuery connection. If you don’t have a BigQuery connection, you can explore the data for free on this Looker-hosted instance or create your own free Google Cloud Platform account. We also plan to make the data available for Amazon Redshift, Snowflake, and other databases later this week.
As additional relevant data sources become available, and as the world’s understanding of this disease grows, we’ll be updating the Block to incorporate that new knowledge. But because we know people are making decisions now about how to react to this new reality, we wanted to make this data available now, with what was available.
Please stay safe and let us know if there are other data sources you’d like to see included at looker-covid-data-block at google [dot] com.
P.S. We’ve decided to present the data in as straightforward a manner as possible. Because we’re making the data provided by other organizations accessible but can’t verify its accuracy, this Block and the underlying data are provided as-is.
There are many legitimate and important questions about how reliable different data sources are and we feel those questions are best left to experts in epidemiology and other related fields. Because there are differing opinions about how to responsibly interpret data on this fast-moving pandemic, we didn’t want to imply more certainty than there is.
We trust our customers to know what they need from this data and to handle it with the right amount of skepticism when warranted. We encourage you to read the important notes from each data source, as they provide critical context.