Aggregate queries (
which touch large swathes of time-series data can
take a long time to compute because the system needs to scan large
amounts of data on every query execution. To make such queries faster,
a continuous aggregate allows materializing the computed aggregates,
while also providing means to continuously, and with low overhead,
keep them up-to-date as the underlying source data changes.
Continuous aggregates are somewhat similar to PostgreSQL's materialized views, but, unlike a materialized view, a continuous aggregate can be continuously and incrementally refreshed. The refreshing can be done either manually or via a policy that runs in the background, and can cover the entire continuous aggregate or just a specific time range. In either case, the refresh only recomputes the aggregate buckets that have changed since the last refresh.
As a quick introductory example, let's create a hypertable
conditions containing temperature data for devices and a continuous
aggregate to compute the hourly average, minimum, and maximum
temperature. Start off by creating the hypertable and populate it with
CREATE TABLE conditions ( time TIMESTAMPTZ NOT NULL, device INTEGER NOT NULL, temperature FLOAT NOT NULL, PRIMARY KEY(time, device) ); SELECT * FROM create_hypertable('conditions', 'time', 'device', 3); INSERT INTO conditions SELECT time, (random()*30)::int, random()*80 - 40 FROM generate_series(TIMESTAMP '2020-01-01 00:00:00', TIMESTAMP '2020-06-01 00:00:00', INTERVAL '10 min') AS time;
You can then create a continuous aggregate view to compute the hourly average, minimum, and maximum temperature:
CREATE MATERIALIZED VIEW conditions_summary_hourly WITH (timescaledb.continuous) AS SELECT device, time_bucket(INTERVAL '1 hour', time) AS bucket, AVG(temperature), MAX(temperature), MIN(temperature) FROM conditions GROUP BY device, bucket;
Lastly, you should add a policy to ensure that the continuous aggregate is refreshed on a regular basis.
SELECT add_continuous_aggregate_policy('conditions_summary_hourly', start_offset => INTERVAL '1 month', end_offset => INTERVAL '1 h', schedule_interval => INTERVAL '1 h');
In this case, the continuous aggregate is refreshed every hour and refresh the last month's data.
You can now run a normal
SELECT on the continuous aggregate and it
gives you the aggregated data, for example, to select the hourly
averages for device 1 during the first three months:
SELECT bucket, avg FROM conditions_summary_hourly WHERE device = 1 AND bucket BETWEEN '2020-01-01' AND '2020-03-31' ORDER BY bucket;
In TimescaleDB 2.7 and above, continuous aggregates support all PostgreSQL aggregate functions. They also support aggregates with keywords such as
ORDER BY, and
FILTER. Older versions of TimescaleDB only support aggregate functions that can be parallelized by PostgreSQL. They don't support
ORDER BY, and
FILTER. For more information, see the how-to guide on continuous aggregates.
A query on a continuous aggregate, by default, uses real-time aggregation (first introduced in TimescaleDB 1.7) to combine materialized aggregates with recent data from the source hypertable. By combining raw and materialized data in this way, real-time aggregation produces accurate and up-to-date results while still benefiting from pre-computed aggregates for a large portion of the result.
Real-time aggregation is the default behavior for any new continuous
aggregates. To disable real-time aggregation and show only
materialized data, add the parameter
timescaledb.materialized_only=true when creating the continuous
aggregate view or set it on an existing continuous aggregate using
ALTER MATERIALIZED VIEW.
To use real-time aggregation on a continuous aggregate created in a version earlier than TimescaleDB 1.7, alter the view to set
Automatic refresh with a continuous aggregate policy
Continuous aggregates can be kept up-to-date through the last bucket width of time by using continuous aggregate policies. Policies allow you to keep a specified window of time within the continuous aggregate updated on a schedule. This provides the ability to do things like:
- have the continuous aggregate and the hypertable be in sync, even when data is removed from the hypertable, or
- keep the aggregate data in the continuous aggregate when removing source data from the hypertable.
It is also possible, starting with TimescaleDB 2.0, to manually refresh
a specific window of time in a continuous aggregate using
refresh_continuous_aggregate. Using this TimescaleDB function gives users the
ability to have the best of both worlds: automatic refresh of recent data and
targeted updates to time ranges that may occur further in history through
a backfill process. Using both tools to keep continuous aggregates up-to-date
provides great control and flexibility!
Found an issue on this page?Report an issue!