Analyze NYC taxi cab data

New York City is home to about 9 million people. This tutorial uses historical data from New York's yellow taxi network, provided by the New York City Taxi and Limousine Commission NYC TLC. The NYC TLC tracks over 200,000 vehicles making about 1 million trips each day. Because nearly all of this data is time-series data, proper analysis requires a purpose-built time-series database, like Timescale.

Prerequisites

Before you begin, make sure you have:

Signed up for a free Timescale account.

Steps in this tutorial

This tutorial covers:

Setting up your dataset: Set up and connect to a Timescale service, and load data into your database using psql.
Querying your dataset: Analyze a dataset containing NYC taxi trip data using Timescale and PostgreSQL.
Bonus: Store data efficiently: Learn how to store and query your NYC taxi trip data more efficiently using compression feature of Timescale.

About querying data with Timescale

This tutorial uses the NYC taxi data to show you how to construct queries for time-series data. The analysis you do in this tutorial is similar to the kind of analysis data science organizations use to do things like plan upgrades, set budgets, and allocate resources.

It starts by teaching you how to set up and connect to a Timescale database, create tables, and load data into the tables using psql.

You then learn how to conduct analysis and monitoring on your dataset. It walks you through using PostgreSQL queries to obtain information, including how to use JOINs to combine your time-series data with relational or business data.

Note

If you have been provided with a pre-loaded dataset on your Timescale service, go directly to the queries section.