Aggregate queries which touch large swathes of time-series data can take a long time to compute because the system needs to scan large amounts of data on every query execution. To make such queries faster, a continuous aggregate allows materializing the computed aggregates, while also providing means to continuously, and with low overhead, keep them up-to-date as the underlying source data changes.
Continuous aggregates are somewhat similar to PostgreSQL's materialized views, but, unlike a materialized view, a continuous aggregate can be continuously and incrementally refreshed. The refreshing can be done either manually or via a policy that runs in the background, and can cover the entire continuous aggregate or just a specific time range. In either case, the refresh only recomputes the aggregate buckets that have changed since the last refresh.
As a quick introductory example, let's create a hypertable
conditions containing temperature data for devices and a continuous
aggregate to compute the daily average, minimum, and maximum
temperature. Start off by creating the hypertable and populate it with
CREATE TABLE conditions ( time TIMESTAMPTZ NOT NULL, device INTEGER NOT NULL, temperature FLOAT NOT NULL, PRIMARY KEY(time, device) ); SELECT * FROM create_hypertable('conditions', 'time', 'device', 3); INSERT INTO conditions SELECT time, (random()*30)::int, random()*80 - 40 FROM generate_series(TIMESTAMP '2020-01-01 00:00:00', TIMESTAMP '2020-06-01 00:00:00', INTERVAL '10 min') AS time;
You can then create a continuous aggregate view to compute the hourly average, minimum, and maximum temperature:
CREATE MATERIALIZED VIEW conditions_summary_hourly WITH (timescaledb.continuous) AS SELECT device, time_bucket(INTERVAL '1 hour', time) AS bucket, AVG(temperature), MAX(temperature), MIN(temperature) FROM conditions GROUP BY device, bucket;
Lastly, you should add a policy to ensure that the continuous aggregate is refreshed on a regular basis.
SELECT add_continuous_aggregate_policy('conditions_summary_hourly', start_offset => INTERVAL '1 month', end_offset => INTERVAL '1 h', schedule_interval => INTERVAL '1 h');
In this case, the continuous aggregate will be refreshed every hour and refresh the last month's data.
You can now run a normal
SELECT on the continuous aggregate and it
will give you the aggregated data, for example, to select the daily
averages for device 1 during the first three months:
SELECT bucket, avg FROM conditions_summary_hourly WHERE device = 1 AND bucket BETWEEN '2020-01-01' AND '2020-03-31' ORDER BY bucket;
Continuous aggregates are supported for most aggregate functions that
can be parallelized by PostgreSQL, which
includes the normal aggregates like
ORDER BY and
DISTINCT cannot be used with
continuous aggregates since they are not possible to parallelize by
PostgreSQL. In addition, TimescaleDB continuous aggregates do not
currently support the
FILTER clause (not to be confused with
WHERE) even though it is possible to parallelize but we might add
support for this in a future version.
A query on a continuous aggregate will, by default, use real-time aggregation (first introduced in TimescaleDB 1.7) to combine materialized aggregates with recent data from the source hypertable. By combining raw and materialized data in this way, real-time aggregation produces accurate and up-to-date results while still benefiting from pre-computed aggregates for a large portion of the result.
Real-time aggregation is the default behavior for any new continuous
aggregates. To disable real-time aggregation and show only
materialized data, add the parameter
timescaledb.materialized_only=true when creating the continuous
aggregate view or set it on an existing continuous aggregate using
ALTER MATERIALIZED VIEW.
Continuous aggregates can be kept up-to-date through the last bucket width of time by using continuous aggregate policies. Policies allow you to keep a specified window of time within the continuous aggregate updated on a schedule. This provides the ability to do things like:
It is also possible, starting with TimescaleDB 2.0, to manually refresh
a specific window of time in a continuous aggregate using
refresh_continuous_aggregate. Using this TimescaleDB function gives users the
ability to have the best of both worlds: automatic refresh of recent data and
targeted updates to time ranges that may occur further in history through
a backfill process. Using both tools to keep continuous aggregates up-to-date
provides great control and flexibility!