Creating a continuous aggregate is a two-step process. You need to create the view first, then enable a policy to keep the view refreshed. You can create the view on a hypertable, or on top of another continuous aggregate. You can have more than one continuous aggregate on each source table or view.
Continuous aggregates require a time_bucket
on the time partitioning column of
the hypertable.
By default, views are automatically refreshed. You can adjust this by setting the WITH NO DATA option. Additionally, the view can not be a security barrier view.
Continuous aggregates use hypertables in the background, which means that they also use chunk time intervals. By default, the continuous aggregate's chunk time interval is 10 times what the original hypertable's chunk time interval is. For example, if the original hypertable's chunk time interval is 7 days, the continuous aggregates that are on top of it have a 70 day chunk time interval.
In this example, we are using a hypertable called conditions
, and creating a
continuous aggregate view for daily weather data. The GROUP BY
clause must
include a time_bucket
expression which uses time dimension column of the
hypertable. Additionally, all functions and their arguments included in
SELECT
, GROUP BY
, and HAVING
clauses must be
immutable.
At the
psql
prompt, create the materialized view:CREATE MATERIALIZED VIEW conditions_summary_dailyWITH (timescaledb.continuous) ASSELECT device,time_bucket(INTERVAL '1 day', time) AS bucket,AVG(temperature),MAX(temperature),MIN(temperature)FROM conditionsGROUP BY device, bucket;Create a policy to refresh the view every hour:
SELECT add_continuous_aggregate_policy('conditions_summary_daily',start_offset => INTERVAL '1 month',end_offset => INTERVAL '1 day',schedule_interval => INTERVAL '1 hour');
You can use most PostgreSQL aggregate functions in continuous aggregations. To see what PostgreSQL features are supported, check the function support table.
Continuous aggregates require a time_bucket
on the time partitioning column of
the hypertable. The time bucket allows you to define a time interval, instead of
having to use specific timestamps. For example, you can define a time bucket as
five minutes, or one day.
When the continuous aggregate is materialized, the materialization table stores partials, which are then used to calculate the result of the query. This means a certain amount of processing capacity is required for any query, and the amount required becomes greater as the interval gets smaller. Because of this, if you have very small intervals, it can be more efficient to run the aggregate query on the raw data in the hypertable. You should test both methods to determine what is best for your dataset and desired bucket interval.
You can't use time_bucket_gapfill directly in a
continuous aggregate. This is because you need access to previous data to
determine the gapfill content, which isn't yet available when you create the
continuous aggregate. You can work around this by creating the continuous
aggregate using time_bucket
, then querying the continuous
aggregate using time_bucket_gapfill
.
By default, when you create a view for the first time, it is populated with
data. This is so that the aggregates can be computed across the entire
hypertable. If you don't want this to happen, for example if the table is very
large, or if new data is being continuously added, you can control the order in
which the data is refreshed. You can do this by adding a manual refresh with
your continuous aggregate policy using the WITH NO DATA
option.
The WITH NO DATA
option allows the continuous aggregate to be created
instantly, so you don't have to wait for the data to be aggregated. Data begins
to populate only when the policy begins to run. This means that only data newer
than the start_offset
time begins to populate the continuous aggregate. If you
have historical data that is older than the start_offset
interval, you need to
manually refresh the history up to the current start_offset
to allow real-time
queries to run efficiently.
At the
psql
prompt, create the view:CREATE MATERIALIZED VIEW cagg_rides_viewWITH (timescaledb.continuous) ASSELECT vendor_id,time_bucket('1h', pickup_datetime) AS hour,count(*) total_rides,avg(fare_amount) avg_fare,max(trip_distance) as max_trip_distance,min(trip_distance) as min_trip_distanceFROM ridesGROUP BY vendor_id, time_bucket('1h', pickup_datetime)WITH NO DATA;Manually refresh the view:
CALL refresh_continuous_aggregate('cagg_rides_view', NULL, localtimestamp - INTERVAL '1 week');Add the policy:
SELECT add_continuous_aggregate_policy('cagg_rides_view',start_offset => INTERVAL '1 week',end_offset => INTERVAL '1 hour',schedule_interval => INTERVAL '30 minutes');
In Timescale 2.10 and later, with PostgreSQL 12 or later, you can
create a continuous aggregate with a query that also includes a JOIN
. For
example:
CREATE MATERIALIZED VIEW conditions_summary_daily_3WITH (timescaledb.continuous) ASSELECT time_bucket(INTERVAL '1 day', day) AS bucket,AVG(temperature),MAX(temperature),MIN(temperature),nameFROM devices JOIN conditions USING (device_id)GROUP BY name, bucket;
Note
For more information about creating a continuous aggregate with a JOIN
, including some additional restrictions, see the about continuous aggregates section.
When you have created a continuous aggregate and set a refresh policy, you can
query the view with a SELECT
query. You can only specify a single hypertable
in the FROM
clause. Including more hypertables, tables, views, or subqueries
in your SELECT
query is not supported. Additionally, make sure that the
hypertable you are querying does not have
row-level-security policies
enabled.
At the
psql
prompt, query the continuous aggregate view calledconditions_summary_hourly
for the average, minimum, and maximum temperatures for the first quarter of 2021 recorded by device 5:SELECT *FROM conditions_summary_hourlyWHERE device = 5AND bucket >= '2020-01-01'AND bucket < '2020-04-01';Alternatively, query the continuous aggregate view called
conditions_summary_hourly
for the top 20 largest metric spreads in that quarter:SELECT *FROM conditions_summary_hourlyWHERE max - min > 1800AND bucket >= '2020-01-01' AND bucket < '2020-04-01'ORDER BY bucket DESC, device DESC LIMIT 20;
Continuous aggregates don't currently support window functions. You can work around this by:
- Creating a continuous aggregate for the other parts of your query, then
- Using the window function on your continuous aggregate at query time
For example, say you have a hypertable named example
with a time
column and
a value
column. You bucket your data by time
and calculate the delta between
time buckets using the lag
window function:
WITH t AS (SELECTtime_bucket('10 minutes', time) as bucket,first(value, time) as valueFROM example GROUP BY bucket)SELECTbucket,value - lag(value, 1) OVER (ORDER BY bucket) deltaFROM t;
You can't create a continuous aggregate using this query, because it contains
the lag
function. But you can create a continuous aggregate by excluding the
lag
function:
CREATE MATERIALIZED VIEW example_aggregateWITH (timescaledb.continuous) ASSELECTtime_bucket('10 minutes', time) AS bucket,first(value, time) AS valueFROM example GROUP BY bucket;
Then, at query time, calculate the delta by using lag
on your continuous
aggregate:
SELECTbucket,value - lag(value, 1) OVER (ORDER BY bucket) AS deltaFROM example_aggregate;
This speeds up your query by calculating the aggregation ahead of time. The delta still needs to be calculated at query time.
Keywords
Found an issue on this page?Report an issue or Edit this page in GitHub.