About storage tiers

Timescale Cloud: Scale, Enterprise

Self-hosted products

MST

Timescale's tiered storage architecture includes a high-performance storage tier and a low-cost object storage tier. You use the high-performance tier for data that requires quick access, and the object tier for rarely used historical data. Tiering policies move older data asynchronously and periodically from high-performance to low-cost storage, sparing you the need to do it manually. Chunks from a single hypertable, including compressed chunks, can stretch across these two storage tiers.

High-performance storage

High-performance storage is where your data is stored by default, until you enable tiered storage and move older data to the low-cost tier. In the high-performance storage, your data is stored in the block format and optimized for frequent querying. The hypercore row-columnar storage engine available in this tier is designed specifically for real-time analytics. It enables you to compress the data in the high-performance storage by up to 90%, while improving performance. Coupled with other optimizations, Timescale Cloud high-performance storage makes sure your data is always accessible and your queries run at lightning speed.

Timescale Cloud high-performance storage comes in the following types:

Standard (default): based on AWS EBS gp3 and designed for general workloads. Provides up to 16 TB of storage and 16,000 IOPS.
Enhanced: based on EBS io2 and designed for high-scale, high-throughput workloads. Provides up to 64 TB of storage and 32,000 IOPS.

See the differences in the underlying AWS storage. You enable enhanced storage as needed in Timescale Console.

Low-cost storage

Once you enable tiered storage, you can start moving rarely used data to the object tier. The object tier is based on AWS S3 and stores your data in the Apache Parquet format. Within a Parquet file, a set of rows is grouped together to form a row group. Within a row group, values for a single column across multiple rows are stored together. The original size of the data in your service, compressed or uncompressed, does not correspond directly to its size in S3. A compressed hypertable may even take more space in S3 than it does in Timescale Cloud.

Apache Parquet allows for more efficient scans across longer time periods, and Timescale Cloud uses other metadata and query optimizations to reduce the amount of data that needs to be fetched to satisfy a query, such as:

Chunk skipping: exclude the chunks that fall outside the query time window.
Row group skipping: identify the row groups within the Parquet object that satisfy the query.
Column skipping: fetch only columns that are requested by the query.

The following query is against a tiered dataset and illustrates the optimizations:


EXPLAIN ANALYZE 
SELECT count(*) FROM
( SELECT device_uuid,  sensor_id FROM public.device_readings 
  WHERE observed_at > '2023-08-28 00:00+00' and observed_at < '2023-08-29 00:00+00' 
  GROUP BY device_uuid,  sensor_id ) q;
            QUERY PLAN                                                                  
           
-------------------------------------------------------------------------------------------------
 Aggregate  (cost=7277226.78..7277226.79 rows=1 width=8) (actual time=234993.749..234993.750 rows=1 loops=1)
   ->  HashAggregate  (cost=4929031.23..7177226.78 rows=8000000 width=68) (actual time=184256.546..234913.067 rows=1651523 loops=1)
         Group Key: osm_chunk_1.device_uuid, osm_chunk_1.sensor_id
         Planned Partitions: 128  Batches: 129  Memory Usage: 20497kB  Disk Usage: 4429832kB
         ->  Foreign Scan on osm_chunk_1  (cost=0.00..0.00 rows=92509677 width=68) (actual time=345.890..128688.459 rows=92505457 loops=1)
               Filter: ((observed_at > '2023-08-28 00:00:00+00'::timestamp with time zone) AND (observed_at < '2023-08-29 00:00:00+00'::timestamp with t
ime zone))
               Rows Removed by Filter: 4220
               Match tiered objects: 3
               Row Groups:
                 _timescaledb_internal._hyper_1_42_chunk: 0-74
                 _timescaledb_internal._hyper_1_43_chunk: 0-29
                 _timescaledb_internal._hyper_1_44_chunk: 0-71
               S3 requests: 177
               S3 data: 224423195 bytes
 Planning Time: 6.216 ms
 Execution Time: 235372.223 ms
(16 rows)

EXPLAIN illustrates which chunks are being pulled in from the object storage tier:

Fetch data from chunks 42, 43, and 44 from the object storage tier.
Skip row groups and limit the fetch to a subset of the offsets in the Parquet object that potentially match the query filter. Only fetch the data for device_uuid, sensor_id, and observed_at as the query needs only these 3 columns.

The object storage tier is more than an archiving solution. It is also:

Cost-effective: store high volumes of data at a lower cost. You pay only for what you store, with no extra cost for queries.
Scalable: scale past the restrictions of even the enhanced high-performance storage tier.
Online: your data is always there and can be queried when needed.

By default, tiered data is not included when you query from a Timescale Cloud service. To access tiered data, you enable tiered reads for a query, a session, or even for all sessions. After you enable tiered reads, when you run regular SQL queries, a behind-the-scenes process transparently pulls data from wherever it's located: the standard high-performance storage tier, the object storage tier, or both. You can JOIN against tiered data, build views, and even define continuous aggregates on it. In fact, because the implementation of continuous aggregates also uses hypertables, they can be tiered to low-cost storage as well.

Timescale charges only for the storage that your data occupies in S3 in the Apache Parquet format, regardless of whether it was compressed in Timescale Cloud before tiering. There are no additional expenses, such as data transfer or compute.

The low-cost storage tier comes with the following limitations:

Limited schema modifications: some schema modifications are not allowed on hypertables with tiered chunks.
Allowed modifications include: renaming the hypertable, adding columns with NULL defaults, adding indexes, changing or renaming the hypertable schema, and adding CHECK constraints. For CHECK constraints, only untiered data is verified. Columns can also be deleted, but you cannot subsequently add a new column to a tiered hypertable with the same name as the now-deleted column.
Disallowed modifications include: adding a column with non-NULL defaults, renaming a column, changing the data type of a column, and adding a NOT NULL constraint to the column.
Limited data changes: you cannot insert data into, update, or delete a tiered chunk. These limitations take effect as soon as the chunk is scheduled for tiering.
Inefficient query planner filtering for non-native data types: the query planner speeds up reads from our object storage tier by using metadata to filter out columns and row groups that don't satisfy the query. This works for all native data types, but not for non-native types, such as JSON, JSONB, and GIS.

Latency: S3 has higher access latency than local storage. This can affect the execution time of queries in latency-sensitive environments, especially lighter queries.
Number of dimensions: you cannot use tiered storage with hypertables partitioned on more than one dimension. Make sure your hypertables are partitioned on time only, before you enable tiered storage.

About storage tiers

High-performance storage

Low-cost storage

Related Content