This tutorial uses a dataset that contains historical data from New York's
yellow taxi network, in a hypertable named rides
. It also includes a separate
tables of payment types and rates, in a regular PostgreSQL table named
payment_types
, and rates
.
Note
If you have been provided with a pre-loaded dataset on your Timescale service, go directly to the queries section.
A service in Timescale is a cloud instance which contains your database.
Each service contains a single database, named tsdb
.
You can connect to a service from your local system using the psql
command-line utility. If you've used PostgreSQL before, you might already have
psql
installed. If not, check out the installing psql section.
In the Timescale portal, click
Create service
.Click
Download the cheatsheet
to download an SQL file that contains the login details for your new service. You can also copy the details directly from this page. When you have copied your password, clickI stored my password, go to service overview
at the bottom of the page.When your service is ready to use, is shows a green
Running
label in theService Overview
. You also receive an email confirming that your service is ready to use.On your local system, at the command prompt, connect to the service using the
Service URL
from the SQL file that you downloaded. When you are prompted, enter the password:psql -x "<SERVICE_URL>"Password for user tsdbadmin:If your connection is successful, you'll see a message like this, followed by the
psql
prompt:psql (13.3, server 12.8 (Ubuntu 12.8-1.pgdg21.04+1))SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)Type "help" for help.tsdb=>
This tutorial uses historical data from New York's yellow taxi network, provided by the New York City Taxi and Limousine Commission NYC TLC.
Hypertables are the core of Timescale. Hypertables enable Timescale to work efficiently with time-series data. Because Timescale is PostgreSQL, all the standard PostgreSQL tables, indexes, stored procedures and other objects can be created alongside your Timescale hypertables. This makes creating and working with Timescale tables similar to standard PostgreSQL.
Create a standard PostgreSQL table to store the taxi trip data using
CREATE TABLE
:CREATE TABLE "rides"(vendor_id TEXT,pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,passenger_count NUMERIC,trip_distance NUMERIC,pickup_longitude NUMERIC,pickup_latitude NUMERIC,rate_code INTEGER,dropoff_longitude NUMERIC,dropoff_latitude NUMERIC,payment_type INTEGER,fare_amount NUMERIC,extra NUMERIC,mta_tax NUMERIC,tip_amount NUMERIC,tolls_amount NUMERIC,improvement_surcharge NUMERIC,total_amount NUMERIC);Convert the standard table into a hypertable partitioned on the
time
column using thecreate_hypertable()
function provided by Timescale. You must provide the name of the table and the column in that table that holds the timestamp data to use for partitioning:SELECT create_hypertable('rides', by_range('pickup_datetime'), create_default_indexes=>FALSE);SELECT add_dimension('rides', by_hash('payment_type', 2));Note
The
by_range
andby_hash
dimension builder is an addition to TimescaleDB 2.13.Create an index to support efficient queries by vendor, rate code, and passenger count:
CREATE INDEX ON rides (vendor_id, pickup_datetime DESC);CREATE INDEX ON rides (rate_code, pickup_datetime DESC);CREATE INDEX ON rides (passenger_count, pickup_datetime DESC);
When you have other relational data that enhances your time-series data, you can
create standard PostgreSQL tables just as you would normally. For this dataset,
there are two other tables of data, called payment_types
and rates
.
Add a table to store the payment types data:
CREATE TABLE IF NOT EXISTS "payment_types"(payment_type INTEGER,description TEXT);INSERT INTO payment_types(payment_type, description) VALUES(1, 'credit card'),(2, 'cash'),(3, 'no charge'),(4, 'dispute'),(5, 'unknown'),(6, 'voided trip');Add a table to store the rates data:
CREATE TABLE IF NOT EXISTS "rates"(rate_code INTEGER,description TEXT);INSERT INTO rates(rate_code, description) VALUES(1, 'standard rate'),(2, 'JFK'),(3, 'Newark'),(4, 'Nassau or Westchester'),(5, 'negotiated fare'),(6, 'group ride');
You can confirm that the scripts were successful by running the \dt
command in
the psql
command line. You should see this:
List of relationsSchema | Name | Type | Owner--------+---------------+-------+----------public | payment_types | table | tsdbadminpublic | rates | table | tsdbadminpublic | rides | table | tsdbadmin(3 rows)
When you have your database set up, you can load the taxi trip data into the
rides
hypertable.
Important
This is a large dataset, so it might take a long time, depending on your network connection.
Download the dataset:
Use your file manager to decompress the downloaded dataset, and take a note of the path to the
nyc_data_rides.csv
file.At the psql prompt, copy the data from the
nyc_data_rides.csv
file into your hypertable. Make sure you point to the correct path, if it is not in your current working directory:\COPY rides FROM nyc_data_rides.csv CSV;
You can check that the data has been copied successfully with this command:
SELECT * FROM rides LIMIT 5;
You should get five records that look like this:
-[ RECORD 1 ]---------+--------------------vendor_id | 1pickup_datetime | 2016-01-01 00:00:01dropoff_datetime | 2016-01-01 00:11:55passenger_count | 1trip_distance | 1.20pickup_longitude | -73.979423522949219pickup_latitude | 40.744613647460938rate_code | 1dropoff_longitude | -73.992034912109375dropoff_latitude | 40.753944396972656payment_type | 2fare_amount | 9extra | 0.5mta_tax | 0.5tip_amount | 0tolls_amount | 0improvement_surcharge | 0.3total_amount | 10.3
Keywords
Found an issue on this page?Report an issue or Edit this page in GitHub.