2016 was an amazing year for data and, incidentally, also for Looker. I’m excited to see what 2017 has in store. I’ve been known to place the occasional wager, so, based on everything we saw in 2016, here are my bets for 2017…
Per Moore’s law, CPUs are always getting faster and cheaper. Of late, databases have been following the same pattern.
In 2013, Amazon changed the game when they introduced Redshift, a massively parallel processing database that allowed companies to store and analyze all their data for a reasonable price. Since then however, companies who saw products like Redshift as datastores with effectively limitless capacity have hit a wall. They have hundreds of terabytes or even petabytes of data and are stuck between paying more for the speed they’d become accustomed to, or waiting five minutes for a query to return.
Enter (or re-enter) Moore’s law. Redshift has become the industry standard for cloud MPP databases, and we don’t see that changing anytime soon. With that said, my prediction for 2017 is that on-demand MPP databases like Google BigQuery and Snowflake will see a huge uptick in popularity. On-demand databases charge pennies for storage, allowing companies to store data without worrying about cost. When users want to run queries or pull data, it spins up the hardware it needs and gets the job done in seconds. They’re fast, scalable, and we expect to see a lot companies using them in 2017.
SQL will have another extraordinary year.
SQL has been around for decades, but from the late 90’s to mid 2000’s, it went out of style as people started exploring NoSQL and Hadoop alternatives. SQL however, has come back with a vengeance. The renaissance of SQL has been beautiful to behold and I don’t even think it’s near it’s peak yet.
The innovations are blowing everyone’s mind. BigQuery has created a product that is essentially infinitely scalable, the original goal of Hadoop, AND practical for analytics, the original goal of relational databases.
SQL engines for Hadoop have continued to gain traction. Products like SparkSQL and Presto are popping up in enterprises and as cloud services because they allow companies to leverage their existing Hadoop clusters and cloud storage for speedy analytics. What’s not to love?
To top it all off, companies like Snowflake and now Amazon Athena are building giant SQL data engines that query directly on S3 buckets, a source that was previously only accessible via command line.
2016 was the best year SQL has ever had.
I’m betting 2017 will be even better.
Companies have been collecting data for a while, so the data lake is well-stocked with fish. But the people who needed data most couldn’t generally find the right fish.
I support the notion of a data lake, dumping all your raw data into one data warehouse. But it doesn’t work if you don’t have a way to make it cohesive when you query it. There have been great innovations by companies like Segment, Fivetran, and Stitch which make moving data into the lake easier. Modeling data is the final step that brings it all together and helps some of the best companies in the world see through data.
Companies like Docker, Amazon Prime Now and BuzzFeed are using all their data to create comprehensive views of their customers and of their businesses. When these final two steps are added, the data lake can finally be a powerful way to get all your data into hands of every decision-maker to make companies more successful.
So there you have it… my prediction for 2017. Let’s revisit this again in December of 2017 and we’ll see how things shook out. Can’t wait till then? Make Looker your 2017 bet.