When you have successfully collected and ingested the data, it's time to analyze it. For this analysis, we use data collected with our ingestion script that contains only successful sale transactions that happened between 1 January 2021 to 12 October 2021 on the OpenSea marketplace, as reported by the OpenSea API.
For simplicity, this tutorial analyzes only those transactions that used
as their payment symbol, but you can modify the script to include more
payment symbols in your analysis if you want to.
We divide our analysis into two parts: simple queries and complex queries. But first we create something to speed up our queries: TimescaleDB continuous aggregates.
TimescaleDB continuous aggregates speed up workloads that need to process large amounts of data. They look like PostgreSQL materialized views, but have a built-in refresh policy that makes sure that the data is up to date as new data comes in. Additionally, the refresh procedure is careful to only refresh data in the materialized view that actually needs to be changed, thereby avoiding recomputation of data that did not change. This smart refresh procedure massively improves the refresh performance of the materialized view and the refresh policy ensures that the data is always up to date.
Continuous aggregates are often used to speed up dashboards and visualizations, summarizing data sampled at high frequency, and querying downsampled data over long time periods.
This tutorial creates two continuous aggregates to speed up queries on assets and on collections.
Create a new continuous aggregate called
assets_daily that computes and stores
the following information about all assets for each day:
asset_id, the collection
it belongs to,
daily average price,
/* Asset continuous aggregates */ CREATE MATERIALIZED VIEW assets_daily WITH (timescaledb.continuous) AS SELECT time_bucket('1 day', time) AS bucket, asset_id, collection_id, mean(percentile_agg(total_price)) AS mean_price, approx_percentile(0.5, percentile_agg(total_price)) AS median_price, COUNT(*) AS volume, SUM(total_price) AS volume_eth, FIRST(total_price, time) AS open_price, MAX(total_price) AS high_price, MIN(total_price) AS low_price, LAST(total_price, time) AS close_price FROM nft_sales WHERE payment_symbol = 'ETH' GROUP BY bucket, asset_id, collection_id
Add a refresh policy to update the continuous aggregate daily with the latest data, so that you can save computation at query time:
SELECT add_continuous_aggregate_policy('assets_daily', start_offset => INTERVAL '3 days', end_offset => INTERVAL '1 day', schedule_interval => INTERVAL '1 day');
Create another continuous aggregate called
collections_daily that computes and
stores the following information about all collections for each day,
daily average price,
the most expensive nft, and
the highest price:
/* Collection continuous aggregates */ CREATE MATERIALIZED VIEW collections_daily WITH (timescaledb.continuous) AS SELECT collection_id, time_bucket('1 day', time) AS bucket, mean(percentile_agg(total_price)) AS mean_price, approx_percentile(0.5, percentile_agg(total_price)) AS median_price, COUNT(*) AS volume, SUM(total_price) AS volume_eth, LAST(asset_id, total_price) AS most_expensive_nft_id, MAX(total_price) AS max_price FROM nft_sales GROUP BY bucket, collection_id; /* Refresh policy */ SELECT add_continuous_aggregate_policy('collections_daily', start_offset => INTERVAL '3 days', end_offset => INTERVAL '1 day', schedule_interval => INTERVAL '1 day');
When you are asking questions where daily aggregations can help with the answer,
you can query the continuous aggregate, rather than the raw data in the
hypertable. This helps speed up the result.
You can start your analysis by asking simple questions about NFT sales that happened in 2021 and answering them using SQL queries. Use these queries as a starting point for your own further analysis. You can modify each query to analyze the time-period, asset, collection, or account that you are curious about!
Where possible, we include dashboard examples from Superset to serve as inspiration for creating your own dashboard which monitors and analyzes NFT sales using free, open-source tools. You can find the code used to create each graph in the NFT Starter Kit Github repo.
Which collections have the highest volume of sales? Answering this is a great starting point for finding collections with assets that are frequently traded, which is important for buyers thinking about the resale value of their NFTs. If you buy an NFT in one of the collections below, there is a good chance you'll be able to find a buyer. In this query, you order the collections by total volume of sales, but you could also order them by ETH volume instead:
/* Collections with the highest volume? */ SELECT slug, SUM(volume) total_volume, SUM(volume_eth) total_volume_eth FROM collections_daily cagg INNER JOIN collections c ON cagg.collection_id = c.id GROUP BY cagg.collection_id, slug ORDER BY total_volume DESC;
For this query, you take advantage of the pre-calculated data about collections
stored in the
collections_daily continuous aggregate. You also perform an
INNER JOIN on the collections relational table to find the
collection name in human readable form, represented by the
Querying from continuous aggregates is faster and allows you to write shorter, more readable queries. It is a pattern that you'll use again in this tutorial, so look out for it!
How many sales took place each day for a certain collection? This query looks
at the daily volume of sales for NFTs in the
cryptokitties collection. This
can help you find which days the NFT traders have been more active, and help you
spot patterns about which days of the week or month have higher or lower volume and why.
You can modify this query to look at your favorite NFT collection, such as
SELECT bucket, slug, volume FROM collections_daily cagg INNER JOIN collections c ON cagg.collection_id = c.id WHERE slug = 'cryptokitties' ORDER BY bucket DESC;
Here's what this query would look like as a time-series chart in Apache Superset:
As a reminder, charts like this are pre-built and ready for you to use and modify as part of the pre-built dashboards in our NFT Starter Kit.
How do the daily sales of NFTs in one collection compare to that of another collection? This query compares the daily sales of two popular NFT collections: CryptoKitties and Ape Gang, in the past three months:
/* Daily number of NFT transactions, "CryptoKitties" vs Ape Gang from past 3 months? */ SELECT bucket, slug, volume FROM collections_daily cagg INNER JOIN collections c ON cagg.collection_id = c.id WHERE slug IN ('cryptokitties', 'ape-gang') AND bucket > NOW() - INTERVAL '3 month' ORDER BY bucket DESC, slug;
This sort of query is useful to track sales activity in collections you're interested in or own assets in, so you can see the activity of other NFT holders. Also, you can modify the time-period under consideration to look at larger (such as 9 months), or smaller (such as 14 days) periods of time.
How many NFTs did a particular person buy in a certain period of time? This
sort of query is useful to monitor the activity of popular NFT collectors,
like American rapper Snoop Dogg (or Cozomo_de_Medici) or
African NFT evangelist Daliso Ngoma or even compare trading
patterns of multiple collectors. Since NFT transactions are public on the Ethereum
blockchain and our database contains seller (
winner_account) columns as well, you can analyze the purchase
activity of a specific account.
This query analyzes Snoop Dogg’s address to
analyze his trades, but you can edit the query to add any address in the
clause to see the specified account's transactions:
/* Snoop Dogg's transactions in the past 3 months aggregated */ WITH snoop_dogg AS ( SELECT id FROM accounts WHERE address = '0xce90a7949bb78892f159f428d0dc23a8e3584d75' ) SELECT COUNT(*) AS trade_count, COUNT(DISTINCT asset_id) AS nft_count, COUNT(DISTINCT collection_id) AS collection_count, COUNT(*) FILTER (WHERE seller_account = (SELECT id FROM snoop_dogg)) AS sale_count, COUNT(*) FILTER (WHERE winner_account = (SELECT id FROM snoop_dogg)) AS buy_count, SUM(total_price) AS total_volume_eth, AVG(total_price) AS avg_price, MIN(total_price) AS min_price, MAX(total_price) AS max_price FROM nft_sales WHERE payment_symbol = 'ETH' AND ( seller_account = (SELECT id FROM snoop_dogg) OR winner_account = (SELECT id FROM snoop_dogg) ) AND time > NOW()-INTERVAL '3 months'
From the result of the query, we can see that Snoop Dogg made 59 trades overall in the past 3 months (bought 58 times, and sold only once). His trades included 57 individual NFTs and 23 collections, totaling 1835.504 ETH spent, with minimum paid price of 0 and max of 1300 ETH.
Whats the most expensive NFT in a certain collection? This query looks at a specific collection (CryptoKitties) and finds the most expensive NFT sold from it. This can help you find the rarest items in a collection and look at the properties that make it rare in order to help you buy items with similar properties from that collection:
/* Top 5 most expensive NFTs in the CryptoKitties collection */ SELECT a.name AS nft, total_price, time, a.url FROM nft_sales s INNER JOIN collections c ON c.id = s.collection_id INNER JOIN assets a ON a.id = s.asset_id WHERE slug = 'cryptokitties' AND payment_symbol = 'ETH' ORDER BY total_price DESC LIMIT 5
|Founder Cat #40||225.0||2021-09-03 14:59:16||https://opensea.io/assets/0x06012c8cf97bead5deae237070f9587f8e7a266d/40|
|Founder Cat #17||177.0||2021-09-03 01:58:13||https://opensea.io/assets/0x06012c8cf97bead5deae237070f9587f8e7a266d/17|
|Founder Cat #38||148.0||2021-09-03 01:58:13||https://opensea.io/assets/0x06012c8cf97bead5deae237070f9587f8e7a266d/38|
What is the daily volume of Ether (ETH) for a specific collection? Using the example of CryptoKitties, this query calculates the daily total ETH spent in sales of NFTs in a certain collection:
/* Daily ETH volume of CryptoKitties NFT transactions? */ SELECT bucket, slug, volume_eth FROM collections_daily cagg INNER JOIN collections c ON cagg.collection_id = c.id WHERE slug = 'cryptokitties' ORDER BY bucket DESC;
How does the daily volume of ETH spent on assets in one collection compare to others? This query uses CryptoKitties and Ape Gang as examples, to find the daily ETH spent on buying assets in those collections in the past three months. You can extend this query to monitor and compare the daily volume spent on your favorite NFT collections and find patterns in sales:
/* Daily ETH volume of NFT transactions: CryptoKitties vs Ape Gang? */ SELECT bucket, slug, volume_eth FROM collections_daily cagg INNER JOIN collections c ON cagg.collection_id = c.id WHERE slug IN ('cryptokitties', 'ape-gang') AND bucket > NOW() - INTERVAL '3 month' ORDER BY bucket, slug DESC;
When you are analyzing the daily price of assets in a specific collection, two useful statistics to use are the mean price and the median price. This query finds the daily mean and median sale prices of assets in the CryptoKitties collection:
/* Mean vs median sale price of CryptoKitties? */ SELECT bucket, slug, mean_price, median_price FROM collections_daily cagg INNER JOIN collections c ON cagg.collection_id = c.id WHERE slug = 'cryptokitties' ORDER BY bucket DESC;
Since calculating the mean and median are computationally expensive for large
datasets, we use the
percentile_agg hyperfunction, a SQL
function that is part of the Timescale Toolkit extension. It accurately
approximates both statistics, as shown in the definition of
median_price in the continuous aggregate we created earlier in the tutorial:
CREATE MATERIALIZED VIEW collections_daily WITH (timescaledb.continuous) AS SELECT collection_id, time_bucket('1 day', time) AS bucket, mean(percentile_agg(total_price)) AS mean_price, approx_percentile(0.5, percentile_agg(total_price)) AS median_price, COUNT(*) AS volume, SUM(total_price) AS volume_eth, LAST(asset_id, total_price) AS most_expensive_nft, MAX(total_price) AS max_price FROM nft_sales s GROUP BY bucket, collection_id;
What days do the most prolific accounts buy on? To answer that question, you can analyze the top five NFT buyer accounts based on the number of NFT purchases, and their total daily volume of NFT bought over time. This is a good starting point to dig deeper into the analysis, as it can help you find days when something happened that made these users buy a lot of NFTs. For example a dip in ETH prices, leading to lower gas fees, or drops of high anticipated collections:
/* Daily total volume of the 5 top buyers */ WITH top_five_buyers AS ( SELECT winner_account FROM nft_sales GROUP BY winner_account ORDER BY count(*) DESC LIMIT 5 ) SELECT time_bucket('1 day', time) AS bucket, count(*) AS total_volume FROM nft_sales WHERE winner_account IN (SELECT winner_account FROM top_five_buyers) GROUP BY bucket ORDER BY bucket DESC
Let's take a look at some more complex questions you can ask about the NFT dataset, as well as more complex queries to retrieve interesting things.
What are the mean and median sales prices of the highest traded NFT from the past day, in 30-minute intervals?
/* Calculating 15-min mean and median sale prices of highest trade count NFT on 2021-10-17 */ WITH one_day AS ( SELECT time, asset_id, total_price FROM nft_sales WHERE time >= '2021-10-17' AND time < '2021-10-18' AND payment_symbol = 'ETH' ) SELECT time_bucket('30 min', time) AS bucket, assets.name AS nft, mean(percentile_agg(total_price)) AS mean_price, approx_percentile(0.5, percentile_agg(total_price)) AS median_price FROM one_day INNER JOIN assets ON assets.id = one_day.asset_id WHERE asset_id = (SELECT asset_id FROM one_day GROUP BY asset_id ORDER BY count(*) DESC LIMIT 1) GROUP BY bucket, nft ORDER BY bucket DESC;
|2021-10-17 23:30:00||Zero [Genesis]||0.06||0.06002456177152414|
|2021-10-17 23:00:00||Zero [Genesis]||0.118||0.1180081944620535|
|2021-10-17 22:30:00||Zero [Genesis]||0.0785333333||0.06002456177152414|
|2021-10-17 22:00:00||Zero [Genesis]||0.0775||0.09995839119153871|
|2021-10-17 21:30:00||Zero [Genesis]||0.0555||0.05801803032917102|
This is a more complex query which uses PostgreSQL Common Table Expressions (CTE)
to first create a sub-table of the data from the past day, called
Then you use the hyperfunction time_bucket to create 30-minute buckets of our data
and use the percentile_agg hyperfunction to find the mean and
median prices for each interval period. Finally, you JOIN on the
to get the name of the specific NFT in order to return it along with the mean and
median price for each time interval.
Open-high-low-close-volume (OHLCV) charts are most often used to illustrate the price of a financial instrument, most commonly stocks, over time. You can create OHLCV charts for a single NFT, or get the OHLCV values for a set of NFTs.
This query finds the OHLCV for NFTs with more than 100 sales in a day, as well as the day on which the trades occurred:
/* Daily OHLCV per asset */ SELECT time_bucket('1 day', time) AS bucket, asset_id, FIRST(total_price, time) AS open_price, LAST(total_price, time) AS close_price, MIN(total_price) AS low_price, MAX(total_price) AS high_price, count(*) AS volume FROM nft_sales WHERE payment_symbol = 'ETH' GROUP BY bucket, asset_id HAVING count(*) > 100 ORDER BY bucket LIMIT 5;
In this query, you used the TimescaleDB hyperfunctions
last() to find the open and close prices respectively. These
hyperfunctions allow you to find the value of one column as ordered by another,
by performing a sequential scan through their groups. In this case, you get the
first and last values of the
total_price column, as ordered by
time column. See the docs for more information.
If you want to run this query regularly, you can create a continuous aggregate
for it, which greatly improves the query performance. Moreover, you can remove
LIMIT 5 and replace it with an additional WHERE clause filtering for a
specific time-period to make the query more useful.
Which assets had the biggest intraday sale price change? You can identify interesting behaviour such as an asset being bought and then sold again for a much higher (or lower) amount within the same day. This can help you identify good flips of NFTs, or perhaps owners whose brand elevated the NFT price thanks to it being part of their collection.
This query finds the assets with the biggest intraday sale price change in the last six months:
/* Daily assets sorted by biggest intraday price change in the last 6 month*/ WITH top_assets AS ( SELECT time_bucket('1 day', time) AS bucket, asset_id, FIRST(total_price, time) AS open_price, LAST(total_price, time) AS close_price, MAX(total_price)-MIN(total_price) AS intraday_max_change FROM nft_sales s WHERE payment_symbol = 'ETH' AND time > NOW() - INTERVAL '6 month' GROUP BY bucket, asset_id ORDER BY intraday_max_change DESC LIMIT 5 ) SELECT bucket, nft, url, open_price, close_price, intraday_max_change FROM top_assets ta INNER JOIN LATERAL ( SELECT name AS nft, url FROM assets a WHERE a.id = ta.asset_id ) assets ON TRUE;```
|2021-09-27 02:00:00||Skulptuur #647||https://opensea.io/assets/0xa7d8d9ef8d8ce8992df33d8b8cf4aebabd5bd270/173000647||25.0||90.0||65.0|
This section contains information about what to do when you've completed the tutorial, and some links to more resources.
The first 20 people to complete this tutorial can earn a limited edition NFT from the Time Travel Tigers collection, for free!
Now that you’ve completed the tutorial, all you need to do is answer the questions in this form (including the challenge question), and we’ll send one of the limited-edition Eon NFTs to your ETH address (at no cost to you!).
You can see all NFTs in the Time Travel Tigers collection live on OpenSea.
Congratulations! You’re now up and running with NFT data and TimescaleDB. Check out our NFT Starter Kit to use as your starting point to build your own, more complex NFT analysis projects.
The Starter Kit contains:
Check out these resources for more about using TimescaleDB with crypto data:
Found an issue on this page?Report an issue!