CSV to Oprah in 5 (codeless) clicks
Jun 13, 2017
Flat-file formats like CSVs are a simple, universal way to store data. But as data gets bigger and bigger, getting a handle on what’s in them can get tricky. In fact, even loading them may be too much for your computer to handle.
Developers might use command-line tools like sed and awk to work with these files, but let’s be honest, UNIX commands aren’t going to cut it for normal folks.
That’s why we’re adding Instant Insight to Looker. While examining millions of rows is probably too much for your laptop to handle, today’s massive databases aren’t fazed even by rows in the billions. Looker’s Instant Insight lets you leverage all the power of those databases, no coding required.
To try it out, we grabbed all 3 million rows of President Obama’s White House Visitor Logs from the archived whitehouse.gov site (each year’s data is a few hundred MB--way more than Excel can comfortably handle). We uploaded the file into Google Cloud Storage, then loaded them into BigQuery (letting BigQuery detect the schema automatically.)
Because Looker is entirely web-based, we even built a Chrome extension that takes us to Instant Insight directly from BigQuery. So now, in just 5 clicks we can explore that CSV in Looker and actually get some insights out of it.
So, about those 5 clicks...
Now that we have this data available to explore in Looker, we can start to poke around. So, first, let’s see what we’re looking at. Counting things isn’t Excel’s strong suit, but Looker’s interface makes it simple.
Right off the bat, we know that this data is not complete. We can see here that 2009, 2013 and 2014 had significantly fewer visitors than 2010 and 2011. And the data from 2012 is missing entirely. Good to know!
Now that we know that, let’s have some fun. So, first things first, who was visiting POTUS and FLOTUS at the White House? (In case you don’t know, those acronyms stand for President of the United States and First Lady of the United States, and yes, they really use them in the White House visitor logs.)
Okay, that list looks pretty standard. But since we’ve got row-level detail available, let’s see if we can find any interesting individuals. How about Oprah?
Looks like Oprah got some major facetime with POTUS and FLOTUS. Who else might have been visiting POTUS?
Jon Stewart seems a likely candidate. But since that’s not his real name….
With this knowledge, we can search for Mr. Leibowitz….
So from this data, we see two things...
- Jonathan (no middle name) Leibowitz visited the White House a lot.
- Names are not unique (duh). Because of this, we would need to get more data to really see if this was Jon Stewart.
Because this data is in Looker, sharing our insights is as easy as copying and pasting the URL. So for your friends who are more into political celebrities (Henry Kissinger, anybody?) than celebrity celebrities, just send them the link and they can join in the fun.
At Looker, we believe deeply in the power of analysis. But that doesn’t just mean focusing on solving the big, sexy problems (though, we are focused on those). It also means removing the obstacles that slow analysts and business users down every day. We think Instant Insight does a lot in that regard, and we’re excited for you to check it out.