Aggregate queries (min()
, max()
, avg()
...)
which touch large swathes of time-series data can
take a long time to compute because the system needs to scan large
amounts of data on every query execution. To make such queries faster,
a continuous aggregate allows materializing the computed aggregates,
while also providing means to continuously, and with low overhead,
keep them up-to-date as the underlying source data changes.
Continuous aggregates are somewhat similar to PostgreSQL's materialized views, but, unlike a materialized view, a continuous aggregate can be continuously and incrementally refreshed. The refreshing can be done either manually or via a policy that runs in the background, and can cover the entire continuous aggregate or just a specific time range. In either case, the refresh only recomputes the aggregate buckets that have changed since the last refresh.
As a quick introductory example, let's create a hypertable
conditions
containing temperature data for devices and a continuous
aggregate to compute the hourly average, minimum, and maximum
temperature. Start off by creating the hypertable and populate it with
some data:
CREATE TABLE conditions (time TIMESTAMPTZ NOT NULL,device INTEGER NOT NULL,temperature FLOAT NOT NULL,PRIMARY KEY(time, device));SELECT * FROM create_hypertable('conditions', 'time', 'device', 3);INSERT INTO conditionsSELECT time, (random()*30)::int, random()*80 - 40FROM generate_series(TIMESTAMP '2020-01-01 00:00:00',TIMESTAMP '2020-06-01 00:00:00',INTERVAL '10 min') AS time;
You can then create a continuous aggregate view to compute the hourly average, minimum, and maximum temperature:
CREATE MATERIALIZED VIEW conditions_summary_hourlyWITH (timescaledb.continuous) ASSELECT device,time_bucket(INTERVAL '1 hour', time) AS bucket,AVG(temperature),MAX(temperature),MIN(temperature)FROM conditionsGROUP BY device, bucket;
Lastly, you should add a policy to ensure that the continuous aggregate is refreshed on a regular basis.
SELECT add_continuous_aggregate_policy('conditions_summary_hourly',start_offset => INTERVAL '1 month',end_offset => INTERVAL '1 h',schedule_interval => INTERVAL '1 h');
In this case, the continuous aggregate is refreshed every hour and refresh the last month's data.
You can now run a normal SELECT
on the continuous aggregate and it
gives you the aggregated data, for example, to select the hourly
averages for device 1 during the first three months:
SELECT bucket, avgFROM conditions_summary_hourlyWHERE device = 1 AND bucket BETWEEN '2020-01-01' AND '2020-03-31'ORDER BY bucket;
important
Continuous aggregates support many PostgreSQL aggregate functions and features. Support depends on your TimescaleDB version. For a table of supported features, see the how-to guide on continuous aggregates.
A query on a continuous aggregate, by default, uses real-time aggregation (first introduced in TimescaleDB 1.7) to combine materialized aggregates with recent data from the source hypertable. By combining raw and materialized data in this way, real-time aggregation produces accurate and up-to-date results while still benefiting from pre-computed aggregates for a large portion of the result.
Real-time aggregation is the default behavior for any new continuous
aggregates. To disable real-time aggregation and show only
materialized data, add the parameter
timescaledb.materialized_only=true
when creating the continuous
aggregate view or set it on an existing continuous aggregate using
ALTER MATERIALIZED VIEW
.
tip
To use real-time aggregation on a continuous aggregate created in a version earlier than TimescaleDB 1.7, alter the view to set timescaledb.materialized_only=false
.
Continuous aggregates can be kept up-to-date through the last bucket width of time by using continuous aggregate policies. Policies allow you to keep a specified window of time within the continuous aggregate updated on a schedule. This provides the ability to do things like:
- have the continuous aggregate and the hypertable be in sync, even when data is removed from the hypertable, or
- keep the aggregate data in the continuous aggregate when removing source data from the hypertable.
It is also possible, starting with TimescaleDB 2.0, to manually refresh
a specific window of time in a continuous aggregate using
refresh_continuous_aggregate
. Using this TimescaleDB function gives users the
ability to have the best of both worlds: automatic refresh of recent data and
targeted updates to time ranges that may occur further in history through
a backfill process. Using both tools to keep continuous aggregates up-to-date
provides great control and flexibility!
Found an issue on this page?
Report an issue!Keywords