This section contains some ideas for troubleshooting common problems experienced with continuous aggregates.
If you have hypertables that use a different retention policy to your continuous aggregates, the retention policies are applied separately. The retention policy on a hypertable determines how long the raw data is kept for. The retention policy on a continuous aggregate determines how long the continuous aggregate is kept for. For example, if you have a hypertable with a retention policy of a week, but a continuous aggregate with a retention policy of a month, the raw data is kept for a week, and the continuous aggregate is kept for a month.
Materialized views are generally used with ordered data. If you insert historic data, or data that is not related to the current time, you need to refresh policies and reevaluate the values that are dragging from past to present.
You can set up an after insert rule for your hypertable or upsert to trigger something that can validate what needs to be refreshed as the data is merged.
Let's say you inserted ordered timeframes named A, B, D, and F, and you already have a continuous aggregation looking for this data. If you now insert E, you need to refresh E and F. However, if you insert C we'll need to refresh C, D, E and F.
ABare consistent and the materialized data is too; you only need to reuse it.
DEF, and refresh policies after C.
This can use a lot of resources to process, especially if you have any important data in the past that also needs to be brought to the present.
Consider an example where you have 300 columns on a single hypertable and use, for example, five of them in a continuous aggregation. In this case, it could be hard to refresh and would make more sense to isolate these columns in another hypertable. Alternatively, you might create one hypertable per metric and refresh them independently.
If you have a time bucket that has already been materialized, the real-time
aggregate won't show the data that has been inserted, updated, or deleted. In
this worked example,
refresh_continuous_aggregate() is called for the data
that is not going to change. When you need to change data that has already been
refresh_continuous_aggregate() for the corresponding
Create and fill the hypertable:
CREATE TABLE conditions( day DATE NOT NULL, city text NOT NULL, temperature INT NOT NULL); SELECT create_hypertable( 'conditions', 'day', chunk_time_interval => INTERVAL '1 day' ); INSERT INTO conditions (day, city, temperature) VALUES ('2021-06-14', 'Moscow', 26), ('2021-06-15', 'Moscow', 22), ('2021-06-16', 'Moscow', 24), ('2021-06-17', 'Moscow', 24), ('2021-06-18', 'Moscow', 27), ('2021-06-19', 'Moscow', 28), ('2021-06-20', 'Moscow', 30), ('2021-06-21', 'Moscow', 31), ('2021-06-22', 'Moscow', 34), ('2021-06-23', 'Moscow', 34), ('2021-06-24', 'Moscow', 34), ('2021-06-25', 'Moscow', 32), ('2021-06-26', 'Moscow', 32), ('2021-06-27', 'Moscow', 31);
Create a real-time aggregate, but don't refresh the data:
CREATE MATERIALIZED VIEW conditions_summary WITH (timescaledb.continuous) AS SELECT city, time_bucket('7 days', day) AS bucket, MIN(temperature), MAX(temperature) FROM conditions GROUP BY city, bucket WITH NO DATA; SELECT * FROM conditions_summary ORDER BY bucket; city | bucket | min | max --------+------------+-----+----- Moscow | 2021-06-14 | 22 | 30 Moscow | 2021-06-21 | 31 | 34
Refresh the data:
CALL refresh_continuous_aggregate('conditions_summary', '2021-06-14', '2021-06-21'); -- The CAGG didn't change, that's expected SELECT * FROM conditions_summary ORDER BY bucket; city | bucket | min | max --------+------------+-----+----- Moscow | 2021-06-14 | 22 | 30 Moscow | 2021-06-21 | 31 | 34
Update the data in the previously materialized bucket:
UPDATE conditions SET temperature = 35 WHERE day = '2021-06-14' and city = 'Moscow';
The updated data is not yet visible in the continuous aggregate. Additionally, INSERT and DELETE are not visible:
SELECT * FROM conditions_summary ORDER BY bucket; city | bucket | min | max --------+------------+-----+----- Moscow | 2021-06-14 | 22 | 30 Moscow | 2021-06-21 | 31 | 34
Refresh the data again to see the updates:
CALL refresh_continuous_aggregate('conditions_summary', '2021-06-14', '2021-06-21'); SELECT * FROM conditions_summary ORDER BY bucket; city | bucket | min | max --------+------------+-----+----- Moscow | 2021-06-14 | 22 | 35 Moscow | 2021-06-21 | 31 | 34
Continuous aggregates don't work on all queries. If you are using a function that continuous aggregates do not support, you see an error like this:
ERROR: invalid continuous aggregate view SQL state: 0A000
Continuous aggregates are supported for most aggregate functions that can be
parallelized by PostgreSQL, including the standard
AVG. You can also use more complex expressions on
top of the aggregate functions, for example
However, aggregates using
ORDER BY and
DISTINCT cannot be used with
continuous aggregates since they are not possible to parallelize with
PostgreSQL. TimescaleDB does not currently support
or window functions in continuous aggregates.
When you have a query that uses a last observation carried forward (locf)
function, the query carries forward NULL values by default. If you want the
function to ignore NULL values instead, you can set
as the second parameter in the query. For example:
dev=# select * FROM (select time_bucket_gapfill(4, time,-5,13), locf(avg(v)::int,treat_null_as_missing:=true) FROM (VALUES (0,0),(8,NULL)) v(time, v) WHERE time BETWEEN 0 AND 10 GROUP BY 1) i ORDER BY 1 DESC; time_bucket_gapfill | locf ---------------------+------ 12 | 0 8 | 0 4 | 0 0 | 0 -4 | -8 | (6 rows)
Found an issue on this page?Report an issue!