When you have your dataset loaded, you can start constructing some queries to discover what your data tells you. In this section, you learn how to combine the data in the NYC taxi dataset with geospatial data from PostGIS, to answer these questions:

To answer these geospatial questions, you need the ride count data from the NYC taxi dataset, but you also need some geospatial data to work out which trips originated where. Timescale is compatible with all other PostgreSQL extensions so you can use the PostGIS extension to slice the data by time and location.

With the extension loaded, you alter your hypertable so it's ready for geospatial queries. The rides table contains columns for pickup latitude and longitude, but it needs to be converted into geometry coordinates so that it works well with PostGIS.

  1. Connect to the Timescale database that contains the NYC taxi dataset.

  2. At the psql prompt, add the PostGIS extension:

    CREATE EXTENSION postgis;

    You can check that PostGIS is installed properly by checking that it appears in the extension list when you run the \dx command.

  3. Alter the hypertable to add geometry columns for ride pick up and drop off locations:

    ALTER TABLE rides ADD COLUMN pickup_geom geometry(POINT,2163);
    ALTER TABLE rides ADD COLUMN dropoff_geom geometry(POINT,2163);
  4. Convert the latitude and longitude points into geometry coordinates, so that they work well with PostGIS. This could take a while, as it needs to update all the data in both columns:

    UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163),
    dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);

When you have your database set up for PostGIS data, you can construct a query to return the number of rides on New Year's Day that originated in Times Square, in 30-minute buckets.

Note

Times Square is located at (40.7589,-73.9851).

  1. Connect to the Timescale database that contains the NYC taxi dataset.

  2. At the psql prompt, use this query to select all rides taken in the first day of January 2016 that picked up within 400m of Times Square, and return a count of rides for each 30 minute interval:

    SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min,
    COUNT(*) AS near_times_sq
    FROM rides
    WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
    AND pickup_datetime < '2016-01-01 14:00'
    GROUP BY thirty_min
    ORDER BY thirty_min;
  3. The data you get back looks a bit like this:

    thirty_min | near_times_sq
    ---------------------+---------------
    2016-01-01 00:00:00 | 74
    2016-01-01 00:30:00 | 102
    2016-01-01 01:00:00 | 120
    2016-01-01 01:30:00 | 98
    2016-01-01 02:00:00 | 112

This query is especially well suited to plot on a map. It looks at rides that were longer than 5 miles, within the city of Manhattan.

In this query, you want to return rides longer than 5 miles, but also include the distance, so that you can visualize longer distances with different visual treatments. The query also includes a WHERE clause to apply a geospatial boundary, looking for trips within 2 km of Times Square. Finally, in the GROUP BY clause, supply the trip_distance and location variables so that Grafana can plot the data properly.

  1. Connect to the Timescale database that contains the NYC taxi dataset.

  2. At the psql prompt, use this query to find rides longer than 5 miles in Manhattan:

    SELECT time_bucket('5m', rides.pickup_datetime) AS time,
    rides.trip_distance AS value,
    rides.pickup_latitude AS latitude,
    rides.pickup_longitude AS longitude
    FROM rides
    WHERE rides.pickup_datetime BETWEEN '2016-01-01T01:41:55.986Z' AND '2016-01-01T07:41:55.986Z' AND
    ST_Distance(pickup_geom,
    ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)
    ) < 2000
    GROUP BY time,
    rides.trip_distance,
    rides.pickup_latitude,
    rides.pickup_longitude
    ORDER BY time
    LIMIT 500;
  3. The data you get back looks a bit like this:

    time | value | latitude | longitude
    ---------------------+-------+--------------------+---------------------
    2016-01-01 01:40:00 | 0.00 | 40.752281188964844 | -73.975021362304688
    2016-01-01 01:40:00 | 0.09 | 40.755722045898437 | -73.967872619628906
    2016-01-01 01:40:00 | 0.15 | 40.752742767333984 | -73.977737426757813
    2016-01-01 01:40:00 | 0.15 | 40.756877899169922 | -73.969779968261719
    2016-01-01 01:40:00 | 0.18 | 40.756717681884766 | -73.967330932617188
    ...
  4. Optional To visualize this in Grafana, create a new panel, and select the Geomap visualization. Select the NYC taxis dataset as your data source, and type the query from the previous step. In the Format as section, select Table. Your world map now shows a dot over New York, zoom in to see the visualization.

  5. Optional To make this visualization more useful, change the way that the rides are displayed. In the options panel, under Data layer, add a layer called Distance traveled and select the markers option. In the Color section, select value. You can also adjust the symbol and size here.

  6. Optional Select a color scheme so that different ride lengths are shown in different colors. In the options panel, under Standard options, change the Color scheme to a useful by value range. This example uses the Blue-Yellow-Red (by value) option.

    Visualizing taxi journeys by distance in Grafana

Keywords

Found an issue on this page?

Report an issue!