When you have your dataset loaded, you can start constructing some queries to discover what your data tells you. In this section, you learn how to combine the data in the NYC taxi dataset with geospatial data from PostGIS, to answer these questions:
- How many rides on New Year's Day 2016 originated from Times Square?
- Which rides traveled more than 5 miles in Manhattan?.
To answer these geospatial questions, you need the ride count data from the NYC taxi dataset, but you also need some geospatial data to work out which trips originated where. Timescale is compatible with all other PostgreSQL extensions so you can use the PostGIS extension to slice the data by time and location.
With the extension loaded, you alter your hypertable so it's ready for geospatial
queries. The rides
table contains columns for pickup latitude and longitude,
but it needs to be converted into geometry coordinates so that it works well
with PostGIS.
Connect to the Timescale database that contains the NYC taxi dataset.
At the psql prompt, add the PostGIS extension:
CREATE EXTENSION postgis;You can check that PostGIS is installed properly by checking that it appears in the extension list when you run the
\dx
command.Alter the hypertable to add geometry columns for ride pick up and drop off locations:
ALTER TABLE rides ADD COLUMN pickup_geom geometry(POINT,2163);ALTER TABLE rides ADD COLUMN dropoff_geom geometry(POINT,2163);Convert the latitude and longitude points into geometry coordinates, so that they work well with PostGIS. This could take a while, as it needs to update all the data in both columns:
UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163),dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);
When you have your database set up for PostGIS data, you can construct a query to return the number of rides on New Year's Day that originated in Times Square, in 30-minute buckets.
Note
Times Square is located at (40.7589,-73.9851).
Connect to the Timescale database that contains the NYC taxi dataset.
At the psql prompt, use this query to select all rides taken in the first day of January 2016 that picked up within 400m of Times Square, and return a count of rides for each 30 minute interval:
SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min,COUNT(*) AS near_times_sqFROM ridesWHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400AND pickup_datetime < '2016-01-01 14:00'GROUP BY thirty_minORDER BY thirty_min;The data you get back looks a bit like this:
thirty_min | near_times_sq---------------------+---------------2016-01-01 00:00:00 | 742016-01-01 00:30:00 | 1022016-01-01 01:00:00 | 1202016-01-01 01:30:00 | 982016-01-01 02:00:00 | 112
This query is especially well suited to plot on a map. It looks at rides that were longer than 5 miles, within the city of Manhattan.
In this query, you want to return rides longer than 5 miles, but also include
the distance, so that you can visualize longer distances with different visual
treatments. The query also includes a WHERE
clause to apply a geospatial
boundary, looking for trips within 2 km of Times Square. Finally, in the
GROUP BY
clause, supply the trip_distance
and location variables so that
Grafana can plot the data properly.
Connect to the Timescale database that contains the NYC taxi dataset.
At the psql prompt, use this query to find rides longer than 5 miles in Manhattan:
SELECT time_bucket('5m', rides.pickup_datetime) AS time,rides.trip_distance AS value,rides.pickup_latitude AS latitude,rides.pickup_longitude AS longitudeFROM ridesWHERE rides.pickup_datetime BETWEEN '2016-01-01T01:41:55.986Z' AND '2016-01-01T07:41:55.986Z' ANDST_Distance(pickup_geom,ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 2000GROUP BY time,rides.trip_distance,rides.pickup_latitude,rides.pickup_longitudeORDER BY timeLIMIT 500;The data you get back looks a bit like this:
time | value | latitude | longitude---------------------+-------+--------------------+---------------------2016-01-01 01:40:00 | 0.00 | 40.752281188964844 | -73.9750213623046882016-01-01 01:40:00 | 0.09 | 40.755722045898437 | -73.9678726196289062016-01-01 01:40:00 | 0.15 | 40.752742767333984 | -73.9777374267578132016-01-01 01:40:00 | 0.15 | 40.756877899169922 | -73.9697799682617192016-01-01 01:40:00 | 0.18 | 40.756717681884766 | -73.967330932617188...Optional To visualize this in Grafana, create a new panel, and select the
Geomap
visualization. Select the NYC taxis dataset as your data source, and type the query from the previous step. In theFormat as
section, selectTable
. Your world map now shows a dot over New York, zoom in to see the visualization.Optional To make this visualization more useful, change the way that the rides are displayed. In the options panel, under
Data layer
, add a layer calledDistance traveled
and select themarkers
option. In theColor
section, selectvalue
. You can also adjust the symbol and size here.Optional Select a color scheme so that different ride lengths are shown in different colors. In the options panel, under
Standard options
, change theColor scheme
to a usefulby value
range. This example uses theBlue-Yellow-Red (by value)
option.
Keywords
Found an issue on this page?Report an issue or Edit this page in GitHub.