This tutorial is a step-by-step guide on how to analyze a time-series cryptocurrency dataset using TimescaleDB. The instructions in this tutorial were used to create this analysis of 4100+ cryptocurrencies.
The tutorial covers these steps:
- Design our database schema
- Create a dataset using publicly available cryptocurrency pricing data
- Load the dataset into TimescaleDB
- Query the data in TimescaleDB
You can skip ahead to the TimescaleDB portion if you would prefer not to run through the scripts to create your database schema or your dataset.
You can also download the resources for this tutorial:
- Schema creation script:
- Dataset creation script:
- Dataset: (Note that this data is from September 2019. Follow the steps in Section 2 of this tutorial if you require fresh data)
To complete this tutorial, you need a cursory knowledge of the Structured Query Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before.
To start, install TimescaleDB. When your installation is complete, you can start ingesting or creating sample data.
This tutorial leads directly into a second tutorial that covers how Timescale can be used with Tableau to visualize time-series data.
When you have a new database up and running, you need some data to insert into it. Before you get data for analysis, you need to define what kind of data you want to perform queries on.
In this analysis, we have two main goals:
- Explore the price of Bitcoin and Ethereum, expressed in different fiat currencies, over time.
- Explore the price of different cryptocurrencies, expressed in Bitcoin, over time.
Some questions you might want to ask:
- How has Bitcoin's price in USD varied over time?
- How has Ethereum's price in ZAR varied over time?
- How has Bitcoin's trading volume in KRW increased or decreased over time?
- Which crypto-currency has the greatest trading volume in the last two weeks?
- Which day was Bitcoin most profitable?
- Which are the most profitable new coins from the past three months?
Understanding the questions you want to ask of the data helps to inform your schema definition.
These requirements lead us to four tables. We need three TimescaleDB
hypertables, called btc_prices
, crypto_prices
, and eth_prices
, and one
relational table, called currency_info
.
The btc_prices
and eth_prices
hypertables contain data about Bitcoin prices
in 17 different fiat currencies since 2010. This is the Bitcoin table, but the
Ethereum table is very similar:
Field | Description |
---|---|
time | The day-specific timestamp of the price records, with time given as the default 00:00:00+00 |
opening_price | The first price at which the coin was exchanged that day |
highest_price | The highest price at which the coin was exchanged that day |
lowest_price | The lowest price at which the coin was exchanged that day |
closing_price | The last price at which the coin was exchanged that day |
volume_btc | The volume exchanged in the cryptocurrency value that day, in BTC |
volume_currency | The volume exchanged in its converted value for that day, quoted in the corresponding fiat currency |
currency_code | Corresponds to the fiat currency used for non-BTC prices/volumes |
Finally, the currency_info
table maps the currency's code to its
English-language name:
Field | Description |
---|---|
currency_code | 2-7 character abbreviation for currency. Used in other hypertables |
currency | English name of currency |
When you have established the schema for the tables in the database, you can
formulate create_table
SQL statements to actually create the tables you need:
--Schema for cryptocurrency analysisDROP TABLE IF EXISTS "currency_info";CREATE TABLE "currency_info"(currency_code VARCHAR (10),currency TEXT);--Schema for btc_prices tableDROP TABLE IF EXISTS "btc_prices";CREATE TABLE "btc_prices"(time TIMESTAMP WITH TIME ZONE NOT NULL,opening_price DOUBLE PRECISION,highest_price DOUBLE PRECISION,lowest_price DOUBLE PRECISION,closing_price DOUBLE PRECISION,volume_btc DOUBLE PRECISION,volume_currency DOUBLE PRECISION,currency_code VARCHAR (10));--Schema for crypto_prices tableDROP TABLE IF EXISTS "crypto_prices";CREATE TABLE "crypto_prices"(time TIMESTAMP WITH TIME ZONE NOT NULL,opening_price DOUBLE PRECISION,highest_price DOUBLE PRECISION,lowest_price DOUBLE PRECISION,closing_price DOUBLE PRECISION,volume_crypto DOUBLE PRECISION,volume_btc DOUBLE PRECISION,currency_code VARCHAR (10));--Schema for eth_prices tableDROP TABLE IF EXISTS "eth_prices";CREATE TABLE "eth_prices"(time TIMESTAMP WITH TIME ZONE NOT NULL,opening_price DOUBLE PRECISION,highest_price DOUBLE PRECISION,lowest_price DOUBLE PRECISION,closing_price DOUBLE PRECISION,volume_eth DOUBLE PRECISION,volume_currency DOUBLE PRECISION,currency_code VARCHAR (10));--Timescale specific statements to create hypertables for better performanceSELECT create_hypertable('btc_prices', 'time');SELECT create_hypertable('eth_prices', 'time');SELECT create_hypertable('crypto_prices', 'time');
Note that there are three create_hypertable
statements which are
TimescaleDB-specific statements. A hypertable is an abstraction of a single
continuous table across time intervals, so that you can query it using standard
SQL. For more on hypertables, see the Timescale docs and this
blog post.
Now that you've defined the data you want, you can construct a dataset
containing that data. You can write a small Python script for extracting data
from CryptoCompare into four CSV files, called coin_names.csv
,
crypto_prices.csv
, btc_prices.csv
, and eth_prices.csv
.
To get data from CryptoCompare, you'll need to obtain an API key. For this analysis, the free key is sufficient.
The script consists of five parts:
- Import the necessary Python libraries in order to complete the data extraction
- Populate the
currency_info
table with a list of coin names - Get the historical Bitcoin (BTC) prices in 4198 other cryptocurrencies and
populate the
crypto_prices
table - Get historical Bitcoin prices in different fiat currencies to populate
btc_prices
- Get historical Ethereum prices in different fiat currencies to populate
eth_prices
Here's the full Python script, which you can also
######################################################################1. Import library and setup API key#####################################################################import requestsimport jsonimport csvfrom datetime import datetimeapikey = 'YOUR_CRYPTO_COMPARE_API_KEY'#attach to end of URLstringurl_api_part = '&api_key=' + apikey######################################################################2. Populate list of all coin names######################################################################URL to get a list of coins from cryptocompare APIURLcoinslist = 'https://min-api.cryptocompare.com/data/all/coinlist'#Get list of cryptos with their symbolsres1 = requests.get(URLcoinslist)res1_json = res1.json()data1 = res1_json['Data']symbol_array = []cryptoDict = dict(data1)#write to CSVwith open('coin_names.csv', mode = 'w') as test_file:test_file_writer = csv.writer(test_file,delimiter = ',',quotechar = '"',quoting=csv.QUOTE_MINIMAL)for coin in cryptoDict.values():if day.get('time') == None:continue # skip this itemname = coin['Name']symbol = coin['Symbol']symbol_array.append(symbol)coin_name = coin['CoinName']full_name = coin['FullName']entry = [symbol, coin_name]test_file_writer.writerow(entry)print('Done getting crypto names and symbols. See coin_names.csv for result')######################################################################3. Populate historical price for each crypto in BTC######################################################################Note: this part might take a while to run since we're populating data for 4k+ coins#counter variable for progress madeprogress = 0num_cryptos = str(len(symbol_array))for symbol in symbol_array:# get data for that currencyURL = 'https://min-api.cryptocompare.com/data/histoday?fsym=' +symbol +'&tsym=BTC&allData=true' +url_api_partres = requests.get(URL)res_json = res.json()data = res_json['Data']# write required fields into csvwith open('crypto_prices.csv', mode = 'a') as test_file:test_file_writer = csv.writer(test_file,delimiter = ',',quotechar = '"',quoting=csv.QUOTE_MINIMAL)for day in data:rawts = day['time']ts = datetime.utcfromtimestamp(rawts).strftime('%Y-%m-%d %H:%M:%S')o = day['open']h = day['high']l = day['low']c = day['close']vfrom = day['volumefrom']vto = day['volumeto']entry = [ts, o, h, l, c, vfrom, vto, symbol]test_file_writer.writerow(entry)progress = progress + 1print('Processed ' + str(symbol))print(str(progress) + ' currencies out of ' + num_cryptos + ' written to csv')print('Done getting price data for all coins. See crypto_prices.csv for result')######################################################################4. Populate BTC prices in different fiat currencies###################################################################### List of fiat currencies we want to query# You can expand this list, but CryptoCompare does not have# a comprehensive fiat list on their sitefiatList = ['AUD', 'CAD', 'CNY', 'EUR', 'GBP', 'GOLD', 'HKD','ILS', 'INR', 'JPY', 'KRW', 'PLN', 'RUB', 'SGD', 'UAH', 'USD', 'ZAR']#counter variable for progress madeprogress2 = 0for fiat in fiatList:# get data for bitcoin price in that fiatURL = 'https://min-api.cryptocompare.com/data/histoday?fsym=BTC&tsym=' +fiat +'&allData=true' +url_api_partres = requests.get(URL)res_json = res.json()data = res_json['Data']# write required fields into csvwith open('btc_prices.csv', mode = 'a') as test_file:test_file_writer = csv.writer(test_file,delimiter = ',',quotechar = '"',quoting=csv.QUOTE_MINIMAL)for day in data:rawts = day['time']ts = datetime.utcfromtimestamp(rawts).strftime('%Y-%m-%d %H:%M:%S')o = day['open']h = day['high']l = day['low']c = day['close']vfrom = day['volumefrom']vto = day['volumeto']entry = [ts, o, h, l, c, vfrom, vto, fiat]test_file_writer.writerow(entry)progress2 = progress2 + 1print('processed ' + str(fiat))print(str(progress2) + ' currencies out of 17 written')print('Done getting price data for btc. See btc_prices.csv for result')######################################################################5. Populate ETH prices in different fiat currencies######################################################################counter variable for progress madeprogress3 = 0for fiat in fiatList:# get data for bitcoin price in that fiatURL = 'https://min-api.cryptocompare.com/data/histoday?fsym=ETH&tsym=' +fiat +'&allData=true' +url_api_partres = requests.get(URL)res_json = res.json()data = res_json['Data']# write required fields into csvwith open('eth_prices.csv', mode = 'a') as test_file:test_file_writer = csv.writer(test_file,delimiter = ',',quotechar = '"',quoting=csv.QUOTE_MINIMAL)for day in data:rawts = day['time']ts = datetime.utcfromtimestamp(rawts).strftime('%Y-%m-%d %H:%M:%S')o = day['open']h = day['high']l = day['low']c = day['close']vfrom = day['volumefrom']vto = day['volumeto']entry = [ts, o, h, l, c, vfrom, vto, fiat]test_file_writer.writerow(entry)progress3 = progress3 + 1print('processed ' + str(fiat))print(str(progress3) + ' currencies out of 17 written')print('Done getting price data for eth. See eth_prices.csv for result')
After running the script, you should have four .csv files:
python crypto_data_extraction.py
Before you start, you need a working installation of TimescaleDB.
Now all your hard work at the beginning comes in handy, and you can use the SQL script you created to set up the TimescaleDB unstance. If you don't want to enter the SQL script by yourself, you can download
instead.Log in to the TimescaleDB instance. Locate your host
, port
, and password
and then connect to the database:
psql -x "postgres://tsdbadmin:{YOUR_PASSWORD_HERE}@{YOUR_HOSTNAME_HERE}:{YOUR_PORT_HERE}/defaultdb?sslmode=require"
From the psql
command line, create a database. Let's call it crypto_data
:
CREATE DATABASE crypto_data;\c crypto_dataCREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
From the command prompt, you can apply the schema creation script to the database like this:
psql -x "postgres://tsdbadmin:{YOUR_PASSWORD_HERE}@{|YOUR_HOSTNAME_HERE}:{YOUR_PORT_HERE}/crypto_data?sslmode=require" < schema.sql
The output should look something like this:
NOTICE: 00000: table "currency_info" does not exist, skippingLOCATION: DropErrorMsgNonExistent, tablecmds.c:1057DROP TABLETime: 78.384 msCREATE TABLETime: 87.011 msNOTICE: 00000: table "btc_prices" does not exist, skippingLOCATION: DropErrorMsgNonExistent, tablecmds.c:1057DROP TABLETime: 77.094 msCREATE TABLETime: 79.815 msNOTICE: 00000: table "crypto_prices" does not exist, skippingLOCATION: DropErrorMsgNonExistent, tablecmds.c:1057DROP TABLETime: 78.430 msCREATE TABLETime: 78.430 msNOTICE: 00000: table "eth_prices" does not exist, skippingLOCATION: DropErrorMsgNonExistent, tablecmds.c:1057DROP TABLETime: 77.410 msCREATE TABLETime: 80.883 mscreate_hypertable-------------------------(1,public,btc_prices,t)(1 row)Time: 83.154 mscreate_hypertable-------------------------(2,public,eth_prices,t)(1 row)Time: 84.650 mscreate_hypertable----------------------------(3,public,crypto_prices,t)(1 row)Time: 81.864 ms
Now when you log back in to the TimescaleDB instance using psql
, you can run
the \dt
command and see that the tables have been created properly:
List of relationsSchema | Name | Type | Owner--------+---------------+-------+-----------public | btc_prices | table | tsdbadminpublic | crypto_prices | table | tsdbadminpublic | currency_info | table | tsdbadminpublic | eth_prices | table | tsdbadmin(4 rows)
Now that you've created the tables with the desired schema, all that's left is to insert the data from the .csv files you created into the tables.
Make sure you are logged into TimescaleDB using psql
so that you can run each
of these commands in turn:
\COPY btc_prices FROM btc_prices.csv CSV;\COPY eth_prices FROM eth_prices.csv CSV;\COPY crypto_prices FROM crypto_prices.csv CSV;\COPY currency_info FROM coin_names.csv CSV;
Important
Data ingestion could take a while, depending on the speed of your Internet connection.
You can verify that the ingestion worked by running a simple SQL command, such as:
SELECT * FROM btc_prices LIMIT 5;
You should get something like this output:
-[ RECORD 1 ]---+-----------------------time | 2013-03-11 00:00:00+00opening_price | 60.56highest_price | 60.56lowest_price | 60.56closing_price | 60.56volume_btc | 0.1981volume_currency | 12currency_code | AUD-[ RECORD 2 ]---+-----------------------time | 2013-03-12 00:00:00+00opening_price | 60.56highest_price | 60.56lowest_price | 41.38closing_price | 47.78volume_btc | 47.11volume_currency | 2297.5currency_code | AUD-[ RECORD 3 ]---+-----------------------time | 2013-03-07 00:00:00+00opening_price | 181.15highest_price | 273.5lowest_price | 237.4closing_price | 262.87volume_btc | 33.04volume_currency | 8974.45currency_code | CNY-[ RECORD 4 ]---+-----------------------time | 2013-03-07 00:00:00+00opening_price | 32.31highest_price | 35.03lowest_price | 26closing_price | 31.57volume_btc | 13321.61volume_currency | 425824.38currency_code | EUR-[ RECORD 5 ]---+-----------------------time | 2013-03-11 00:00:00+00opening_price | 35.7highest_price | 37.35lowest_price | 35.4closing_price | 37.15volume_btc | 3316.09volume_currency | 121750.98currency_code | EURTime: 224.741 ms
At the beginning of the tutorial, we defined some questions to answer. Naturally, each of those questions has an answer in the form of a SQL query. Now that you database is set up properly, and the data is captured and ingested, you can find some answers:
For example, How did Bitcoin price in USD vary over time?
SELECT time_bucket('7 days', time) AS period,last(closing_price, time) AS last_closing_priceFROM btc_pricesWHERE currency_code = 'USD'GROUP BY periodORDER BY period
How did BTC daily returns vary over time? Which days had the worst and best returns?
SELECT time,closing_price / lead(closing_price) over prices AS daily_factorFROM (SELECT time,closing_priceFROM btc_pricesWHERE currency_code = 'USD'GROUP BY 1,2) sub window prices AS (ORDER BY time DESC)
How did the trading volume of Bitcoin vary over time in different fiat currencies?
SELECT time_bucket('7 days', time) AS period,currency_code,sum(volume_btc)FROM btc_pricesGROUP BY currency_code, periodORDER BY period
How did Ethereum (ETH) price in BTC vary over time?
SELECTtime_bucket('7 days', time) AS time_period,last(closing_price, time) AS closing_price_btcFROM crypto_pricesWHERE currency_code='ETH'GROUP BY time_periodORDER BY time_period
How did ETH prices, in different fiat currencies, vary over time?
SELECT time_bucket('7 days', c.time) AS time_period,last(c.closing_price, c.time) AS last_closing_price_in_btc,last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'USD') AS last_closing_price_in_usd,last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'EUR') AS last_closing_price_in_eur,last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'CNY') AS last_closing_price_in_cny,last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'JPY') AS last_closing_price_in_jpy,last(c.closing_price, c.time) * last(b.closing_price, c.time) FILTER (WHERE b.currency_code = 'KRW') AS last_closing_price_in_krwFROM crypto_prices cJOIN btc_prices bON time_bucket('1 day', c.time) = time_bucket('1 day', b.time)WHERE c.currency_code = 'ETH'GROUP BY time_periodORDER BY time_period
Which cryptocurrencies had the most transaction volume in the past 14 days?
SELECT 'BTC' AS currency_code,sum(b.volume_currency) AS total_volume_in_usdFROM btc_prices bWHERE b.currency_code = 'USD'AND now() - date(b.time) < INTERVAL '14 day'GROUP BY b.currency_codeUNIONSELECT c.currency_code AS currency_code,sum(c.volume_btc) * avg(b.closing_price) AS total_volume_in_usdFROM crypto_prices c JOIN btc_prices b ON date(c.time) = date(b.time)WHERE c.volume_btc > 0AND b.currency_code = 'USD'AND now() - date(b.time) < INTERVAL '14 day'AND now() - date(c.time) < INTERVAL '14 day'GROUP BY c.currency_codeORDER BY total_volume_in_usd DESC
Which cryptocurrencies had the top daily return?
WITHprev_day_closing AS (SELECTcurrency_code,time,closing_price,LEAD(closing_price) OVER (PARTITION BY currency_code ORDER BY TIME DESC) AS prev_day_closing_priceFROMcrypto_prices), daily_factor AS (SELECTcurrency_code,time,CASE WHEN prev_day_closing_price = 0 THEN 0 ELSE closing_price/prev_day_closing_price END AS daily_factorFROMprev_day_closing)SELECTtime,LAST(currency_code, daily_factor) AS currency_code,MAX(daily_factor) AS max_daily_factorFROMdaily_factorGROUP BYtime
Keywords
Found an issue on this page?
Report an issue!