How to query spatial data in Redshift with Looker

Alec Roos, Sales Engineer at Looker

Jul 16, 2020

Amazon Redshift recently introduced support for native processing of spatial data with the addition of a new polymorphic data type, GEOMETRY. Accompanying the GEOMETRY data type release is the addition of over 55 new SQL functions, which will allow users to construct geometric shapes, import, export, access, and process the spatial data. Using Looker, users can take advantage of these new features to perform some practical (and fun) analyses.

To demonstrate what’s possible with these new features in Redshift with Looker, let’s explore a concept we’re all familiar with (I hope!) — finding vacation accommodations.

Example: querying AirBnB data

For this example, our mission is to help a friend find a place to stay during their travels to Berlin, Germany. To do this, we’ll take advantage of AirBnB listings — a public dataset that contains AirBnB rental listings along with their locations — to help aid us in our search. Within the AirBnB listings dataset, there are three tables that we’re going to make use of: accommodations, places, and geoname.

Accommodations

The accommodations table houses information about the rental including host name, room type, minimum reservation requirements, annual rental availability, price, and most importantly location represented in the well-known binary (WKB) format.

Places

The places table lists information about the neighborhoods, such as classification (suburb vs. locality), population, name, as well as the name of the neighborhood represented in the WKB format.

Geoname

The geoname table lists information that describes a number of different landmarks and attractions within Berlin. Fields contained in this table include the name of the attraction, timezone, latitude and longitude, population, and elevation.

Step 1: Familiarize yourself with the spatial data types

Let’s start by getting a better understanding of the geometry types that we’ll be working with in the accommodations and places tables. We’ll use the spatial function GeometryType() to define a new dimension in our LookML view file.


dimension: Geometry_Type_Places {
    type: string
    sql: geometrytype(${geom}) ;;
}

dimension: Geometry_Type_Accommodations {
    type: string
    sql: geometrytype(${shape}) ;;
}

From the output, we find that the rental locations are defined as points (coordinates) and the neighborhoods are defined as multi-polygons, or more intuitively, an enclosed area.

Step 2: mapping and visualizing in Looker

Our next step is to translate the shape column in the accommodations table into an X,Y coordinate pair so that we can map and visualize the results in Looker. To make use of Looker’s mapping functionality, we’ll need to define a new dimension of type: location.

A Looker dimension with type: location requires two parameters; sql_latitude and sql_longitude. In order to piece apart the X,Y coordinates from the point column in the accommodations table, we’ll leverage the spatial functions ST_X() and ST_Y(). Below you can see how this is defined in the LookML view file.


dimension: coordinates {
    type: location
    sql_latitude: ST_Y(${shape}) ;;
    sql_longitude: ST_X(${shape})   ;;
}

With the rental coordinates now defined in our Looker model, we can go ahead and map the data points.

Right off the bat, the data shows that there is a lot of availability for rentals that fit within the parameters of our query. To narrow down our search a bit, let’s incorporate some additional requirements from our friend. They’re looking for a rental that is under €200.00 per night and located in either the Mitte or Kreuzberg neighborhoods.

To filter our results by neighborhood, we’ll first need to define join logic between the accommodations and places tables. We can do this by making use of the spatial function ST_Within(A, B), which returns true if the first input geometry is within the second input geometry. For our example, we can determine if the rental location coordinates (defined as geometry type point) in the accommodations table are contained within the neighborhood (defined as geometry type multi-polygon) in the places table.


explore: accommodations {

  join: places {
    type: left_outer
    relationship: many_to_one
    sql_on:
      ST_Within(
        ${accommodations.shape},
        ${places.geom}
      )
    ;;
  }
}

Now that the join logic is defined, we can now filter on the price and neighborhood requirements provided by our friend. Plus, with these filters in place, we can also leverage Looker’s heatmap to get a better understanding of which locations within our neighborhoods are differentiated by average price.

The heatmap shows us what areas and neighborhoods of Berlin are more expensive on average than others. From the key in the lower left-hand corner, you’ll find that areas shaded in red are more expensive than those shaded in green. To narrow the pool of recommendations, let’s focus on an area that is right in the middle of average rental price and drill into what’s available there.

After drilling down, it looks like rental ID 2241958 is a great option for our friend. But before we make our final selection, let’s check to see how close it is to some landmarks and neighborhoods they want to visit on the trip.

Finalizing our recommendation

To get a better understanding of how far away our rental is to points of interest in the area, we’ll need to bring in the geonames table. A great way to do this is to use the ST_DWithin() function, which will allow us to specify a maximum distance our friend would be willing to travel within that specified distance. ST_DWithin() returns true if the euclidean distance between the two input geometry values is not larger than the threshold.

Let’s turn back to our model and define the join. With the addition of the ST_DWithin() function it should now look something like this:


 join: geoname {
    type: left_outer
    relationship: many_to_one
    sql_on:
    ST_DistanceSphere(
    ${accommodations.shape},
    ST_GeomFromText('POINT(' || ${geoname.longitude} || ' ' || ${geoname.latitude} || ')', 4326),
    10
    );;
  }

What if we didn’t want to hard-code the threshold input to 10 and instead want to give our user the ability to dynamically set that value? Not to worry — that’s where Looker parameters come in. With Looker parameters, we can create a parameter in LookML to allow the user to specify the maximum distance they are willing to travel.


parameter: distance_to_attraction_input {
    type: number
    default_value: "10"
}

With this parameter that our user to interact with, we’ll also need to assign the value to a dimension, which we can then place in lieu of the hard-coded 10 in our join, like so:


dimension: distance_to_attraction {
  type: number
  sql: {% parameter distance_to_attraction_input %} ;;
}

join: geoname {
  type: left_outer
  relationship: many_to_one
  sql_on:
  ST_DistanceSphere(
  ${accommodations.shape},
  ST_GeomFromText('POINT(' || ${geoname.longitude} || ' ' || ${geoname.latitude} || ')', 4326),
  ${accommodations.distance_to_attraction}
  );;
}

Moment of truth — let’s check to see if our rental selection is within walking distance to some areas our friend wants to visit during their travels.

Success! The areas of interest within walking distance check the box for our friend, so we’ll keep our first choice as the recommended rental for their trip — rental ID 2241958.


With Looker, you can visualize and map data points like this and be up and running quickly. By making use of AWS Redshift’s spatial functions, you can find unique ways of joining and relating tables together to answer interesting questions and do more with your data.

Next Previous

Subscribe for the latest posts