Looker Blog : Data Matters

Name Game

Lloyd Tabb, Founder & CTO

Jan 14, 2016

Name Game

Even very simple data sets can be complex to analyze. Starting with a really simple data set of names, we're going to see if we can learn a few things. The data includes names given in the United States since 1910, listed by state and gender. We'll go through the LookML model building and then build some data discovery widgets to learn about the popularity of names.

The Data

The data set is very simple, but with these 5 columns we can figure out a lot about names.

name | gender | state | year | number

Each row in the table is a roll up of the number of people with a given first name and gender born in a US state in a given year.

  • name - the first name, capitalized
  • gender - either 'M' or 'F'
  • state - two letter abbreviation of a US state
  • year - number, 1910 to 2013
  • number - the number of people born that year with that particular name.

The Looker View

In the Looker view, Looker generates a dimension for each column in the database table. Looker uses the column definitions to generate the proper dimension types. For example, Looker generates the default 'string' dimension type for the name, gender, and state columns. Looker generates number dimension types for the table's two columns with numeric values: year and number. We then renamed the dimension number to population, which corresponds to the number of people born that year with that particular name. Finally, we modified the format of year so that it does not add a comma in the number.

- view: names_step_0
  sql_table_name: names
  fields:
  - dimension: name
    sql: ${TABLE}.name

  - dimension: gender
    sql: ${TABLE}.gender

  - dimension: state
    sql: ${TABLE}.state

  - dimension: year
    type: number
    sql: ${TABLE}.year
    value_format: "0000"    # format without commas

  - dimension: population   # rename from 'number' to 'population'
    type: number
    sql: ${TABLE}.number

  - measure: count
    type: count
    drill_fields: [name, gender, state, year, number]

Note that Looker also generated a count measure automatically.

Explore!

All of the queries below can be created by Exploring From Here. Select dimensions and measures, add filters and bonk the 'Run Button'.

How Big is the Table?

By selecting count, we can see how many rows there are in the table.

Explore From Here

Let's look at the Raw Data

By selecting all of the dimensions, we can see the contents of the table.

Explore From Here

Adding Functionality

Over the following steps, we will gradually add more functionality to our model so we can ask increasingly complex questions. These steps are:

Step 1: Simple Measures

Measures in Looker's world are summary calculations. In this step we'll explore the name data by adding only two measures, total population and name count. These calculations will let us compare the use of names over time.

Step 2: Computed Dimensions

Sometimes data, as it is stored in tables, is not in its most useful form. We'll generalize year into decade and roll up states into regions.

Step 3: Filtered Measures

By building filtered measures, we can answer questions about particular aspects of the data. In this step, we'll look at the gender nature of some names, find regional names, and find particularly old and particularly new names.

Step 4: Normalizing Population with Cohorts

Population is not popularity. Older names may have been popular in their time, but as the total population grows, is the percentage with a specific name going up or down?

Step 5: Using Rank to Figure out Name Density and Popularity

Ranking is a useful tool for lots of things. We're going to use it to figure out if people are more or less likely to name their kids something common and how that trend has changed over time.

Go to the next Name Game page, Step 1: Adding simple measures

Next Previous

Subscribe for the Latest Posts