New York City is home to about 9 million people. This tutorial uses historical data from New York's yellow taxi network, provided by the New York City Taxi and Limousine Commission NYC TLC. The NYC TLC tracks over 200,000 vehicles making about 1 million trips each day. Because nearly all of this data is time-series data, proper analysis requires a purpose-built time-series database, like Timescale.
In the beginner NYC taxis tutorial, you looked at constructing queries that looked at how many rides were taken, and when. The NYC taxi cab dataset also contains information about where each ride was picked up. This is geospatial data, and you can use a PostgreSQL extension called PostGIS to examine where rides are originating from. Additionally, you can visualize the data in Grafana, by overlaying it on a map.
Before you begin, make sure you have:
- Signed up for a free Timescale account.
- Optional If you want to graph your queries, signed up for a Grafana account.
This tutorial covers:
- Setting up your dataset: Set up and connect to a Timescale
service, and load data into your database using
- Querying your dataset: Analyze a dataset containing NYC taxi trip data using Timescale and PostgreSQL, and plot the results in Grafana.
This tutorial uses the NYC taxi data to show you how to construct queries for geospatial time-series data. The analysis you do in this tutorial is similar to the kind of analysis civic organizations do to plan new roads and public services.
It starts by teaching you how to set up and connect to a Timescale database,
create tables, and load data into the tables using
psql. If you have already
completed the first NYC taxis tutorial, then you already
have the dataset loaded, and you can skip straight to the queries.
You then learn how to conduct analysis and monitoring on your dataset. It walks you through using PostgreSQL queries with the PostGIS extension to obtain information, and plotting the results in Grafana.
Found an issue on this page?Report an issue!