Continuous aggregation

Aggregation is a way of combing data to get insights from it. At its simplest, aggregation is something like looking for an average. For example, if you have data showing temperature changes over time, you can calculate an average of those temperatures, or a count of how many readings have been taken. Average, sum, and count are all example of simple aggregates.

However, aggregation calculations can get big and slow, quickly. If you want to find the average open and closing values of a range of stocks for each day, then you need something a little more sophisticated. That's where Timescale continuous aggregates come in. Continuous aggregates can minimize the number of records that you need to look up to perform your query.

Continuous aggregates

Time-series data usually grows very quickly. And that means that aggregating the data into useful summaries can become very slow. Continuous aggregates makes aggregating data lightning fast.

If you are collecting data very frequently, you might want to aggregate your data into minutes or hours instead. For example, if you have a table of temperature readings taken every second, you can find the average temperature for each hour. Every time you run this query, the database needs to scan the entire table and recalculate the average every time.

Continuous aggregates are a kind of hypertable that is refreshed automatically in the background as new data is added, or old data is modified. Changes to your dataset are tracked, and the hypertable behind the continuous aggregate is automatically updated in the background.

You don't need to manually refresh your continuous aggregates, they are continuously and incrementally updated in the background. Continuous aggregates also have a much lower maintenance burden than regular PostgreSQL materialized views, because the whole view is not created from scratch on each refresh. This means that you can get on with working your data instead of maintaining your database.

Because continuous aggregates are based on hypertables, you can query them in exactly the same way as your other tables, and enable compression or tiered storage on your continuous aggregates. You can even create continuous aggregates on top of your continuous aggregates.

By default, querying continuous aggregates provides you with real-time data. Pre-aggregated data from the materialized view is combined with recent data that hasn't been aggregated yet. This gives you up-to-date results on every query.

There are three main ways to make aggregation easier: materialized views, continuous aggregates, and real time aggregates.

Materialized views are a standard PostgreSQL function. They are used to cache the result of a complex query so that you can reuse it later on. Materialized views do not update regularly, although you can manually refresh them as required.

Continuous aggregates are a Timescale only feature. They work in a similar way to a materialized view, but they are updated automatically in the background, as new data is added to your database. Continuous aggregates are updated continuously and incrementally, which means they are less resource intensive to maintain than materialized views. Continuous aggregates are based on hypertables, and you can query them in the same way as you do your other tables.

Real time aggregates are a Timescale only feature. They are the same as continuous aggregates, but they add the most recent raw data to the previously aggregated data to provide accurate and up to date results, without needing to aggregate data as it is being written.

In this section, you create a continuous aggregate, and query it for more information about the trading data.

Create an aggregate query

The financial sector regularly uses candlestick charts to visualize the price change of an asset. Each candlestick represents a time period, such as one minute or one hour, and shows how the asset's price changed during that time.

Candlestick charts are generated from the open, high, low, close, and volume data for each financial asset during the time period. This is often abbreviated as OHLCV:

Open: opening price
High: highest price
Low: lowest price
Close: closing price
Volume: volume of transactions

In this section, you use a SELECT statement to find the high and low values with min and max functions, and the open and close values with first and last functions. You then aggregate the data into 1 day buckets, like this:

SELECT
  time_bucket('1 day', "time") AS day,
  symbol,
  max(price) AS high,
  first(price, time) AS open,
  last(price, time) AS close,
  min(price) AS low
FROM stocks_real_time srt

Then, you organize the results by day and symbol:

GROUP BY day, symbol
ORDER BY day DESC, symbol

Creating an aggregate query

At the command prompt, use the psql connection string from the cheat sheet you downloaded to connect to your database.

At the psql prompt, type this query:


SELECT
  time_bucket('1 day', time) AS bucket,
  symbol,
  max(price) AS high,
  first(price, time) AS open,
  last(price, time) AS close,
  min(price) AS low
FROM stocks_real_time srt
WHERE time > now() - INTERVAL '1 week'
GROUP BY bucket, symbol
ORDER BY bucket, symbol
LIMIT 10;

-- Output

Type q to return to the psql prompt.

Create a continuous aggregate

Now that you have an aggregation query, you can use it to create a continuous aggregate.

In this section, your query starts by creating a materialized view called stock_candlestick_daily, then converting it into a Timescale continuous aggregate:

CREATE MATERIALIZED VIEW stock_candlestick_daily
WITH (timescaledb.continuous) AS

Then, you give the aggregate query you created earlier as the contents for the continuous aggregate:

SELECT
  time_bucket('1 day', "time") AS day,
  symbol,
  max(price) AS high,
  first(price, time) AS open,
  last(price, time) AS close,
  min(price) AS low
FROM stocks_real_time srt
GROUP BY day, symbol

When you run this query, you create the view, and populate the view with the aggregated calculation. This can take a few minutes to run, because it needs to perform these calculations across all of your stock trade data the first time.

When you continuous aggregate has been created and the data aggregated for the first time, you can query your continuous aggregate. For example, you can look at all the aggregated data, like this:

SELECT * FROM stock_candlestick_daily
  ORDER BY day DESC, symbol;

Or you can look at a single stock, like this:

SELECT * FROM stock_candlestick_daily
WHERE symbol='TSLA';

Creating a continuous aggregate

At the command prompt, use the psql connection string from the cheat sheet you downloaded to connect to your database.

At the psql prompt, type this query:


CREATE MATERIALIZED VIEW stock_candlestick_daily
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 day', "time") AS day,
  symbol,
  max(price) AS high,
  first(price, time) AS open,
  last(price, time) AS close,
  min(price) AS low
FROM stocks_real_time srt
GROUP BY day, symbol;

Query your continuous aggregate for all stocks:


SELECT * FROM stock_candlestick_daily
ORDER BY day DESC, symbol
LIMIT 10;

-- Output

Query your continuous aggregate for Tesla stock:


SELECT * FROM stock_candlestick_daily
WHERE symbol='TSLA'
ORDER BY day DESC
LIMIT 10;

-- Output

For more information about how continuous aggregates work, see the continuous aggregates section.