Time-series data usually grows very quickly. And that means that aggregating the data into useful summaries can become very slow. Continuous aggregates makes aggregating data lightning fast.
If you are collecting data very frequently, you might want to aggregate your data into minutes or hours instead. For example, if you have a table of temperature readings taken every second, you can find the average temperature for each hour. Every time you run this query, the database needs to scan the entire table and recalculate the average every time.
Continuous aggregates are a kind of hypertable that is refreshed automatically in the background as new data is added, or old data is modified. Changes to your dataset are tracked, and the hypertable behind the continuous aggregate is automatically updated in the background.
You don't need to manually refresh your continuous aggregates, they are continuously and incrementally updated in the background. Continuous aggregates also have a much lower maintenance burden than regular PostgreSQL materialized views, because the whole view is not created from scratch on each refresh. This means that you can get on with working your data instead of maintaining your database.
Because continuous aggregates are based on hypertables, you can query them in exactly the same way as your other tables, and enable compression or tiered storage on your continuous aggregates. You can even create continuous aggregates on top of your continuous aggregates.
By default, querying continuous aggregates provides you with real-time data. Pre-aggregated data from the materialized view is combined with recent data that hasn't been aggregated yet. This gives you up-to-date results on every query.
There are three main ways to make aggregation easier: materialized views, continuous aggregates, and real time aggregates.
Materialized views are a standard PostgreSQL function. They are used to cache the result of a complex query so that you can reuse it later on. Materialized views do not update regularly, although you can manually refresh them as required.
Continuous aggregates are a Timescale only feature. They work in a similar way to a materialized view, but they are updated automatically in the background, as new data is added to your database. Continuous aggregates are updated continuously and incrementally, which means they are less resource intensive to maintain than materialized views. Continuous aggregates are based on hypertables, and you can query them in the same way as you do your other tables.
Real time aggregates are a Timescale only feature. They are the same as continuous aggregates, but they add the most recent raw data to the previously aggregated data to provide accurate and up to date results, without needing to aggregate data as it is being written.
You can create a continuous aggregate on top of another continuous aggregate. This allows you to summarize data at different granularities. For example, you might have a raw hypertable that contains second-by-second data. Create a continuous aggregate on the hypertable to calculate hourly data. To calculate daily data, create a continuous aggregate on top of your hourly continuous aggregate.
For more information, see the documentation about continuous aggregates on continuous aggregates.
Continuous aggregates supports the following JOIN features:
Feature | TimescaleDB < 2.10.x | TimescaleDB <= 2.15.x | TimescaleDB >= 2.16.x |
---|---|---|---|
INNER JOIN | ❌ | ✅ | ✅ |
LEFT JOIN | ❌ | ❌ | ✅ |
LATERAL JOIN | ❌ | ❌ | ✅ |
Joins between ONE hypertable and ONE standard PostgreSQL table | ❌ | ✅ | ✅ |
Joins between ONE hypertable and MANY standard PostgreSQL tables | ❌ | ❌ | ✅ |
Join conditions must be equality conditions, and there can only be ONE JOIN condition | ❌ | ✅ | ✅ |
Any join conditions | ❌ | ❌ | ✅ |
JOINS in TimescaleDB must that meet the following conditions:
- Only changes to the hypertable are tracked, they are updated in the continuous aggregate when it is refreshed. Changes to standard PostgreSQL table are not tracked.
- You can use an
INNER
,LEFT
andLATERAL
joins, no other join type is supported. - Joins on the materialized hypertable of a continuous aggregate are not supported.
- Hierarchical continuous aggregates can be created on top of a continuous
aggregate with a
JOIN
clause, but cannot themselves have aJOIN
clauses.
Given the following schema:
CREATE TABLE locations (id TEXT PRIMARY KEY,name TEXT);CREATE TABLE devices (id SERIAL PRIMARY KEY,location_id TEXT,name TEXT);CREATE TABLE conditions ("time" TIMESTAMPTZ,device_id INTEGER,temperature FLOAT8);SELECT create_hypertable('conditions', by_range('time'));
See the following JOIN
examples on Continuous Aggregates:
INNER JOIN
on a single equality condition, using theON
clause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditionsJOIN devices ON devices.id = conditions.device_idGROUP BY bucket, devices.nameWITH NO DATA;INNER JOIN
on a single equality condition, using theON
clause, with a further condition added in theWHERE
clause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditionsJOIN devices ON devices.id = conditions.device_idWHERE devices.location_id = 'location123'GROUP BY bucket, devices.nameWITH NO DATA;INNER JOIN
on a single equality condition specified inWHERE
clause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditions, devicesWHERE devices.id = conditions.device_idGROUP BY bucket, devices.nameWITH NO DATA;INNER JOIN
on multiple equality conditions:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditionsJOIN devices ON devices.id = conditions.device_id AND devices.location_id = 'location123'GROUP BY bucket, devices.nameWITH NO DATA;TimescaleDB v2.16.x and higher.
INNER JOIN
with a single equality condition specified inWHERE
clause can be combined with further conditions in theWHERE
clause:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditions, devicesWHERE devices.id = conditions.device_idAND devices.location_id = 'location123'GROUP BY bucket, devices.nameWITH NO DATA;TimescaleDB v2.16.x and higher.
INNER JOIN
between an hypertable and multiple Postgres tables:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name AS device, locations.name AS location, MIN(temperature), MAX(temperature)FROM conditionsJOIN devices ON devices.id = conditions.device_idJOIN locations ON locations.id = devices.location_idGROUP BY bucket, devices.name, locations.nameWITH NO DATA;TimescaleDB v2.16.x and higher.
LEFT JOIN
between an hypertable and a Postgres table:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditionsLEFT JOIN devices ON devices.id = conditions.device_idGROUP BY bucket, devices.nameWITH NO DATA;TimescaleDB v2.16.x and higher.
LATERAL JOIN
between an hypertable and a sub-query:CREATE MATERIALIZED VIEW conditions_by_day WITH (timescaledb.continuous) ASSELECT time_bucket('1 day', time) AS bucket, devices.name, MIN(temperature), MAX(temperature)FROM conditions,LATERAL (SELECT * FROM devices WHERE devices.id = conditions.device_id) AS devicesGROUP BY bucket, devices.nameWITH NO DATA;TimescaleDB v2.16.x and higher.
In TimescaleDB 2.7 and later, continuous aggregates support all PostgreSQL
aggregate functions. This includes both parallelizable aggregates, such as SUM
and AVG
, and non-parallelizable aggregates, such as RANK
.
In TimescaleDB 2.10.0 and later, the FROM
clause supports JOINS
, with
some restrictions. For more information, see the JOIN
support section.
In older versions of Timescale, continuous aggregates only support aggregate functions that can be parallelized by PostgreSQL. You can work around this by aggregating the other parts of your query in the continuous aggregate, then using the window function to query the aggregate.
The following table summarizes the aggregate functions supported in continuous aggregates:
Function, clause, or feature | TimescaleDB 2.6 and earlier | TimescaleDB 2.7, 2.8, and 2.9 | TimescaleDB 2.10 and later |
---|---|---|---|
Parallelizable aggregate functions | ✅ | ✅ | ✅ |
non-parallelizable SQL aggregates | ❌ | ✅ | ✅ |
ORDER BY | ❌ | ✅ | ✅ |
Ordered-set aggregates | ❌ | ✅ | ✅ |
Hypothetical-set aggregates | ❌ | ✅ | ✅ |
DISTINCT in aggregate functions | ❌ | ✅ | ✅ |
FILTER in aggregate functions | ❌ | ✅ | ✅ |
FROM clause supports JOINS | ❌ | ❌ | ✅ |
DISTINCT works in aggregate functions not in the query definition. For example, for the table:
CREATE TABLE public.candle(symbol_id uuid NOT NULL,symbol text NOT NULL,"time" timestamp with time zone NOT NULL,open double precision NOT NULL,high double precision NOT NULL,low double precision NOT NULL,close double precision NOT NULL,volume double precision NOT NULL);
- The following works:CREATE MATERIALIZED VIEW candles_start_endWITH (timescaledb.continuous) ASSELECT time_bucket('1 hour', "time"), COUNT(DISTINCT symbol), first(time, time) as first_candle, last(time, time) as last_candleFROM candleGROUP BY 1;
- This does not:CREATE MATERIALIZED VIEW candles_start_endWITH (timescaledb.continuous) ASSELECT DISTINCT ON (symbol)symbol,symbol_id, first(time, time) as first_candle, last(time, time) as last_candleFROM candleGROUP BY symbol_id;
If you want the old behavior in later versions of TimescaleDB, set the
timescaledb.finalized
parameter to false
when you create your continuous
aggregate.
Continuous aggregates consist of:
- Materialization hypertable to store the aggregated data in
- Materialization engine to aggregate data from the raw, underlying, table to the materialization hypertable
- Invalidation engine to determine when data needs to be re-materialized, due to changes in the data
- Query engine to access the aggregated data
Continuous aggregates take raw data from the original hypertable, aggregate it, and store the intermediate state in a materialization hypertable. When you query the continuous aggregate view, the state is returned to you as needed.
Using the same temperature example, the materialization table looks like this:
day | location | chunk | avg temperature partial |
---|---|---|---|
2021/01/01 | New York | 1 | {3, 219} |
2021/01/01 | Stockholm | 1 | {4, 280} |
2021/01/02 | New York | 2 | |
2021/01/02 | Stockholm | 2 | {5, 345} |
The materialization table is stored as a Timescale hypertable, to take
advantage of the scaling and query optimizations that hypertables offer.
Materialization tables contain a column for each group-by clause in the query,
a chunk
column identifying which chunk in the raw data this entry came from,
and a partial aggregate
column for each aggregate in the query.
The partial column is used internally to calculate the output. In this example, because the query looks for an average, the partial column contains the number of rows seen, and the sum of all their values. The most important thing to know about partials is that they can be combined to create new partials spanning all of the old partials' rows. This is important if you combine groups that span multiple chunks.
For more information, see materialization hypertables.
The materialization engine performs two transactions. The first transaction blocks all INSERTs, UPDATEs, and DELETEs, determines the time range to materialize, and updates the invalidation threshold. The second transaction unblocks other transactions, and materializes the aggregates. The first transaction is very quick, and most of the work happens during the second transaction, to ensure that the work does not interfere with other operations.
When you query the continuous aggregate view, the materialization engine combines the aggregate partials into a single partial for each time range, and calculates the value that is returned. For example, to compute an average, each partial sum is added up to a total sum, and each partial count is added up to a total count, then the average is computed as the total sum divided by the total count.
Any change to the data in a hypertable could potentially invalidate some materialized rows. The invalidation engine checks to ensure that the system does not become swamped with invalidations.
Fortunately, time-series data means that nearly all INSERTs and UPDATEs have a recent timestamp, so the invalidation engine does not materialize all the data, but to a set point in time called the materialization threshold. This threshold is set so that the vast majority of INSERTs contain more recent timestamps. These data points have never been materialized by the continuous aggregate, so there is no additional work needed to notify the continuous aggregate that they have been added. When the materializer next runs, it is responsible for determining how much new data can be materialized without invalidating the continuous aggregate. It then materializes the more recent data and moves the materialization threshold forward in time. This ensures that the threshold lags behind the point-in-time where data changes are common, and that most INSERTs do not require any extra writes.
When data older than the invalidation threshold is changed, the maximum and minimum timestamps of the changed rows is logged, and the values are used to determine which rows in the aggregation table need to be recalculated. This logging does cause some write load, but because the threshold lags behind the area of data that is currently changing, the writes are small and rare.
Keywords
Found an issue on this page?Report an issue or Edit this page in GitHub.