Advanced data management
The final part of this tutorial shows you some more advanced techniques to efficiently manage your tick and candlestick data long-term. TimescaleDB is equipped with multiple features that help you manage your data lifecycle and reduce your disk storage needs as your data grows.
This section contains four examples of how you can set up automation policies on your tick data hypertable and your candlestick continuous aggregates. This can help you save on disk storage and improve the performance of long-range analytical queries by automatically:
- Deleting older tick data
- Deleting older candlestick data
- Compressing tick data
- Compressing candlestick data
Before you implement any of these automation policies, it's important to have a high-level understanding of chunk time intervals in TimescaleDB hypertables and continuous aggregates. The chunk time interval you set for your tick data table directly affects how these automation policies work. For more information, see the hypertables and chunks section.
TimescaleDB uses hypertables to provide a high-level and familiar abstraction layer to interact with PostgreSQL tables. You just need to access one hypertable to access all of your time-series data.
Under the hood, TimescaleDB creates chunks based on the timestamp column.
Each chunk size is determined by the
parameter. You can provide this parameter when creating the hypertable, or you can change
it afterwards. If you don't provide this optional parameter, the
chunk time interval defaults to 7 days. This means that each of the
chunks in the hypertable contains 7 days' worth of data.
Knowing your chunk time interval is important. All of the TimescaleDB automation policies described in this section depend on this information, and the chunk time interval fundamentally affects how these policies impact your data.
In this section, learn about these automation policies and how they work in the context of financial tick data.
Usually, the older your time-series data, the less relevant and useful it is. This is often the case with tick data as well. As time passes, you might not need the raw tick data any more, because you only want to query the candlestick aggregations. In this scenario, you can decide to remove tick data automatically from your hypertable after it gets older than a certain time interval.
TimescaleDB has a built-in way to automatically remove raw data after a specific time. You can set up this automation using a data retention policy:
SELECT add_retention_policy('crypto_ticks', INTERVAL '7 days');
When you run this, it adds a data retention policy to the
hypertable that removes a chunk after all the data in the chunk becomes
older than 7 days. All records in the chunk need to be
older than 7 days before the chunk is dropped.
Knowledge of your hypertable's chunk time interval
is crucial here. If you were to set a data retention policy with
INTERVAL '3 days', the policy would not remove any data after three days, because your chunk time interval is seven days. Even after three
days have passed, the most recent chunk still contains data that is newer than three
days, and so cannot be removed by the data retention policy.
If you want to change this behavior, and drop chunks more often and sooner, experiment with different chunk time intervals. For example, if you set the chunk time interval to be two days only, you could create a retention policy with a 2-day interval that would drop a chunk every other day (assuming you're ingesting data in the meantime).
For more information, see the data retention section.
Deleting older raw tick data from your hypertable while retaining aggregate views for longer periods is a common way of minimizing disk utilization. However, deleting older candlestick data from the continuous aggregates can provide another method for further control over long-term disk use. TimescaleDB allows you to create data retention policies on continuous aggregates as well.
You can set up a data retention policy to remove old data from
one_min_candle continuous aggregate:
SELECT add_retention_policy('one_min_candle', INTERVAL '70 days');
This data retention policy removes chunks from the continuous aggregate
that are older than 70 days. In TimescaleDB, this is determined by the
range_end property of a hypertable, or in the case of a continuous
aggregate, the materialized hypertable. In practice, this means that if
you were to
define a data retention policy of 30 days for a continuous aggregate that has
chunk_time_interval of 70 days, data would not be removed from the
continuous aggregates until the
range_end of a chunk is at least 70
days older than the current time, due to the chunk time interval of the
TimescaleDB allows you to keep your tick data in the hypertable but still save on storage costs with TimescaleDB's native compression. You need to enable compression on the hypertable and set up a compression policy to automatically compress old data.
Enable compression on
ALTER TABLE crypto_ticks SET ( timescaledb.compress, timescaledb.compress_segmentby = 'symbol' );
Set up compression policy to compress data that's older than 7 days:
SELECT add_compression_policy('crypto_ticks', INTERVAL '7 days');
Executing these two SQL scripts compresses chunks that are older than 7 days.
For more information, see the compression section.
Beginning with TimescaleDB 2.6, you can also set up a compression policy on your continuous aggregates. This is a useful feature if you store a lot of historical candlestick data that consumes significant disk space, but you still want to retain it for longer periods.
Enable compression on the
ALTER MATERIALIZED VIEW one_min_candle set (timescaledb.compress = true);
Add a compression policy to compress data after 70 days:
SELECT add_compression_policy('one_min_candle', compress_after=> INTERVAL '70 days');
Found an issue on this page?Report an issue!