In the TimescaleDB extras GitHub repository, we provide explicit functions for backfilling batch data to compressed chunks, which is useful for inserting a batch of backfilled data (as opposed to individual row inserts). By "backfill", we mean inserting data corresponding to a timestamp well in the past, which given its timestamp, already corresponds to a compressed chunk.
In the below example, we backfill data into a temporary table; such temporary
tables are short-lived and only exist for the duration of the database
session. Alternatively, if backfill is common, one might use a normal table for
this instead, which would allow multiple writers to insert into the table at
the same time before the
To use this procedure:
Create a table with the same schema as the hypertable (in
cpu) that we are backfilling into:
CREATE TEMPORARY TABLE cpu_temp AS SELECT * FROM cpu WITH NO DATA;
Insert data into the backfill table.
Use a supplied backfill procedure to perform the above steps: halt compression policy, identify those compressed chunks to which the backfilled data corresponds, decompress those chunks, insert data from the backfill table into the main hypertable, and then re-enable compression policy:
CALL decompress_backfill(staging_table=>'cpu_temp', destination_hypertable=>'cpu');`
If using a temp table, the table is automatically dropped at the end of your database session. If using a normal table, after you are done backfilling the data successfully, you will likely want to truncate your table in preparation for the next backfill (or drop it completely).
To perform these steps more manually, we first identify and turn off our compression policy, before manually decompressing chunks. To accomplish this we first find the job_id of the policy using:
SELECT s.job_id FROM timescaledb_information.jobs j INNER JOIN timescaledb_information.job_stats s ON j.job_id = s.job_id WHERE j.proc_name = 'policy_compression' AND s.hypertable_name = <target table>;
Next, pause the job with:
SELECT alter_job(<job_id>, scheduled => false);
We have now paused the compress chunk policy from the hypertable which will leave us free to decompress the chunks we need to modify via backfill or update. To decompress the chunk(s) that we will be modifying, for each chunk:
Similar to above, you can also decompress a set of chunks based on a
time range by first looking up this set of chunks via
SELECT decompress_chunk(i) from show_chunks('conditions', newer_than, older_than) i;
Once your backfill and update operations are complete we can simply re-enable our compression policy job:
SELECT alter_job(<job_id>, scheduled => true);
This job will re-compress any chunks that were decompressed during your backfilling
operation the next time it runs. To have it run immediately, you can expressly execute
the command via
One of the current limitations of TimescaleDB is that once chunks are converted into compressed column form, we do not allow updates and deletes of the data or changes to the schema without manual decompression, except as noted above. In other words, chunks are partially immutable in compressed form. Attempts to modify the chunks' data in those cases will either error or fail silently (as preferred by users). We plan to remove this limitation in future releases.