Users of TimescaleDB often have two common questions:
The default time interval is 7 days. You can explicitly configure time
intervals when you create a hypertable, using the
After the hypertable is created, you can change the interval for new chunks
The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space partitions) fit into memory. As such, we typically recommend setting the interval so that these chunk(s) comprise no more than 25% of main memory.
If you want to see the current interval length for your hypertables, you can
_timescaledb_catalog as follows. Note that for time-based interval
lengths, these are reported in microseconds.
SELECT h.table_name, c.interval_length FROM _timescaledb_catalog.dimension c JOIN _timescaledb_catalog.hypertable h ON h.id = c.hypertable_id; table_name | interval_length -----------+----------------- metrics | 604800000000 (1 row)
To determine this, you need to have a general idea of your data rate. If you are writing roughly 2GB of data per day and have 64GB of memory, setting the time interval to a week would be good. If you are writing 10GB per day on the same machine, setting the time interval to a day would be appropriate. This interval would also hold if data is loaded more in batches, e.g., you bulk load 70GB of data per week, with data corresponding to records from throughout the week.
While it's generally safer to make chunks smaller rather than too large, setting intervals too small can lead to many chunks, which corresponds to increased planning latency for some types of queries.
Space partitioning is optional but can make sense for certain types of data and is recommended when using distributed hypertables.
Space partitions use hashing: Every distinct item is hashed to one of N buckets. In a distributed hypertable, each bucket of the primary space dimension corresponds to a specific data node (although two or more buckets could map to the same node). In non-distributed hypertables, each bucket can map to a distinct disk (using, e.g., a tablespace).
Spreading chunks along disks and nodes in the space dimension allows for increased I/O parallelization, either by (a) having multiple concurrent client processes, or, by (b) splitting the work of a single client across multiple worker processes on a single node or multiple concurrent requests across several data nodes.
In summary, to benefit from parallel I/O, one can do one of the following:
For each physical disk on a single instance, add a separate tablespace to the database. TimescaleDB actually allows you to add multiple tablespaces to a single hypertable (although under the covers, each underlying chunk is mapped by TimescaleDB to a single tablespace / physical disk).
Configure a distributed hypertable that spreads inserts and queries across multiple data nodes.
Apart from the built-in parallel I/O support in the database, a more transparent way to increase I/O performance is to use a RAID setup across multiple physical disks, and expose a single logical disk to the hypertable (i.e., via a single tablespace). With a RAID setup, no spatial partitioning is required on a single node.
Found an issue on this page?Report an issue!