NFT schema design and ingestion
A properly designed database schema is essential to efficiently store and analyze data. This tutorial uses NFT time-series data with multiple supporting relational tables.
To help you get familiar with NFT data, here are some of the questions that could be answered with this dataset:
- Which collections have the highest trading volume?
- What's the number of daily transactions of a given collection or asset?
- Which collections have the most trading volume in Ether (ETH)?
- Which account made the most NFT trades?
- How are the mean and median sale prices correlated?
One theme across all these questions is that most of the insights are about the sale itself, or the aggregation of sales. So you need to create a schema which focuses on the time-series aspect of the data. It's also important to make sure that you can JOIN supporting tables, so you can more easily make queries that touch both the time-series and the relational tables. TimescaleDB's PostgreSQL foundation and full-SQL support allows you to easily combine time-series and relational tables during your analysis.
You need these tables:
- nft_sales: successful NFT transactions
Relational tables (regular PostgreSQL tables):
- assets: unique NFT items
- collections: NFT collections
- accounts: NFT trading accounts/users
The nft_sales table
nft_sales table contains information about successful sale transactions
in time-series form. One row represents one successful sale event on the
idfield is a unique field provided by the OpenSea API.
total_pricefield is the price paid for the NFTs in ETH (or other cryptocurrency payment symbol available on OpenSea).
quantityfield indicates how many NFTs were sold in the transaction (can be more than 1).
auction_typefield is NULL by default, unless the transaction happened as part of an auction.
collection_idfields can be used to JOIN the supporting relational tables.
|id||OpenSea ID (unique)|
|time||Time of the sale|
|asset_id||ID of the NFT, FK: assets(id)|
|collection_id||ID of the collection this NFT belongs to, FK: collections(id))|
|auction_type||Auction type ('dutch', 'english', 'min_price')|
|contract_address||Address of the smart contract|
|quantity||NFT quantity sold|
|payment_symbol||Payment symbol (usually ETH, depends on the blockchain where the NFT is minted)|
|total_price||Total price paid for the NFT|
|seller_account||Seller's account, FK: accounts(id)|
|from_account||Account used to transfer from, FK: accounts(id)|
|to_account||Account used to transfer to, FK: accounts(id)|
|winner_account||Buyer's account, FK: accounts(id)|
The assets table
assets table contains information about the assets (NFTs) that are in the
transactions. One row represents a unique NFT asset on the OpenSea platform.
namefield is the name of the NFT, and is not unique.
idfield is the primary key, provided by the OpenSea API.
- One asset can be referenced from multiple transactions (traded multiple times).
|id||OpenSea ID (PK)|
|name||Name of the NFT|
|description||Description of the NFT|
|contract_date||Creation date of the smart contract|
|url||OpenSea URL of the NFT|
|owner_id||ID of the NFT owner account, FK: accounts(id)|
|details||Other extra data fields (JSONB)|
The collections table
collections table holds information about the NFT collections. One row
represents a unique NFT collection.
One collection includes multiple unique NFTs (that are in the
slugfield is a unique identifier of the collection.
|slug||Slug of the collection (unique)|
|name||Name of the collection|
|url||OpenSea url of the collection|
|details||Other extra data fields (JSONB)|
The accounts table
accounts table includes the accounts that have participated in at least
one transaction from the nft_sales table.
One row represents one unique account on the OpenSea platform.
addressis never NULL and it's unique
user_nameis NULL unless it's been submitted on the OpenSea profile by the user
|user_name||OpenSea user name|
|address||Account address, unique|
|details||Other extra data fields (JSONB)|
The data types used in the schema for this tutorial have been determined based
on our research and hands-on experience working with the OpenSea API and the
data pulled from OpenSea. Start by running these SQL commands to create the schema.
Alternatively, you can download and run the
file from our NFT Starter Kit GitHub repository.
CREATE TABLE collections ( id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, slug TEXT UNIQUE, name TEXT, url TEXT, details JSONB ); CREATE TABLE accounts ( id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, user_name TEXT, address TEXT UNIQUE NOT NULL, details JSONB ); CREATE TABLE assets ( id BIGINT PRIMARY KEY, name TEXT, collection_id BIGINT REFERENCES collections (id), -- collection description TEXT, contract_date TIMESTAMP WITH TIME ZONE, url TEXT UNIQUE, img_url TEXT, owner_id BIGINT REFERENCES accounts (id), -- account details JSONB ); CREATE TYPE auction AS ENUM ('dutch', 'english', 'min_price'); CREATE TABLE nft_sales ( id BIGINT, "time" TIMESTAMP WITH TIME ZONE, asset_id BIGINT REFERENCES assets (id), -- asset collection_id BIGINT REFERENCES collections (id), -- collection auction_type auction, contract_address TEXT, quantity NUMERIC, payment_symbol TEXT, total_price DOUBLE PRECISION, seller_account BIGINT REFERENCES accounts (id), -- account from_account BIGINT REFERENCES accounts (id), -- account to_account BIGINT REFERENCES accounts (id), -- account winner_account BIGINT REFERENCES accounts (id), -- account CONSTRAINT id_time_unique UNIQUE (id, time) ); SELECT create_hypertable('nft_sales', 'time'); CREATE INDEX idx_asset_id ON nft_sales (asset_id); CREATE INDEX idx_collection_id ON nft_sales (collection_id); CREATE INDEX idx_payment_symbol ON nft_sales (payment_symbol);
id field in each table is
BIGINT because its storage size is 8 bytes in
PostgreSQL (as opposed to
INT's 4 bytes) which is needed to make sure this
value doesn't overflow.
quantity field we suggest using numeric or decimal (which works the
same way in PostgreSQL) as the data type, because in some edge cases we
experience transactions where the quantity was too big even for BIGINT.
total_price needs to be
double precision because NFT prices often include
many decimals, especially in the case of Ether (ETH) and similar cryptocurrencies
which are, functionally, infinitely divisible.
We created an
auction_type as this value can only be 'dutch',
'english', or 'min_price', representing the different types of auctions used
to sell an NFT.
We decided to not store all the data fields that are available from the
OpenSea API, only those that we deem interesting or useful for future analysis.
But we still wanted to keep all of the unused data fields somewhere close,
so we added a
details JSONB column to each relational table. This column
contains additional information about the record. For example, it includes a
background_color as a field for the assets.
Note: In our sample dataset, we chose not to include the JSONB data to keep the size of the dataset easily managable. If you want a dataset with the full JSON data included, you need to fetch the data directly from the OpenSea API (see below for steps).
When you have your database and schema created, you can ingest some data to play with! You have two options to ingest NFT data for this tutorial:
- Fetch data directly from the OpenSea API
- Download sample data and import it
Fetch data directly from the OpenSea API
To ingest data from the OpenSea API, you can use the
opensea_ingest.py script included
in the starter kit repository on GitHub. The script connects to the OpenSea
/events endpoint, and fetches data from the specified time period.
opensea_ingest.pyscript, try following the procedure to download a historical data file and import it. You can use this data file to complete the tutorial.
Fetching data directly from the OpenSea API
- Clone the nft-starter-kit repository on Github:
git clone https://github.com/timescale/nft-starter-kit.git cd nft-starter-kit
- Create a new Python virtual environment and install the requirements:
virtualenv env && source env/bin/activate pip install -r requirements.txt
- Replace the parameters in the
DB_NAME="tsdb" HOST="YOUR_HOST_URL" USER="tsdbadmin" PASS="YOUR_PASSWORD_HERE" PORT="PORT_NUMBER" OPENSEA_START_DATE="2021-10-01T00:00:00" # example start date (UTC) OPENSEA_END_DATE="2021-10-06T23:59:59" # example end date (UTC) OPENSEA_APIKEY="YOUR_OPENSEA_APIKEY" # need to request from OpenSea's docs
- Run the Python script:This starts ingesting data in batches, 300 rows at a time:
python opensea_ingest.pyYou can stop the ingesting process anytime (Ctrl+C), otherwise the script runs until all the transactions have been ingested from the given time period.
Start ingesting data between 2021-10-01 00:00:00+00:00 and 2021-10-06 23:59:59+00:00 --- Fetching transactions from OpenSea... Data loaded into temp table! Data ingested! Data has been backfilled until this time: 2021-10-06 23:51:31.140126+00:00 ---
Download sample NFT data
You can download and insert sample CSV files that contain NFT sales data from 1 October 2021 to 7 October 2021.
Downloading sample NFT data
- Download sample CSV files containing one week of sample data.
- Uncompress the ZIP file:
- Connect to your database:If you're using Timescale Cloud, the instructions under
psql -x "postgres://host:port/tsdb?sslmode=require"
How to Connectprovide a customized command to run to connect directly to your database.
- Import the CSV files in this order (it can take a few minutes in total):
\copy accounts FROM 001_accounts.csv CSV HEADER; \copy collections FROM 002_collections.csv CSV HEADER; \copy assets FROM 003_assets.csv CSV HEADER; \copy nft_sales FROM 004_nft_sales.csv CSV HEADER;
After ingesting NFT data, you can try running some queries on your database:
SELECT count(*), MIN(time) AS min_date, MAX(time) AS max_date FROM nft_sales
Found an issue on this page?Report an issue!