Introduction

Aggregate data by time interval, while filling in gaps of missing data.

time_bucket_gapfill works similarly to time_bucket, but adds gapfilling capabilities. The other functions in this group must be used in the same query as time_bucket_gapfill. They control how missing values are treated.

Important

time_bucket_gapfill must be used as a top-level expression in a query or subquery. You cannot, for example, nest time_bucket_gapfill in another function (such as round(time_bucket_gapfill(...))), or cast the result of the gapfilling call. If you need to cast, you can use time_bucket_gapfill in a subquery, and let the outer query do the type cast.

Bucket

time_bucket_gapfill
Bucket rows by time interval while filling gaps in data

Interpolator

interpolate
Fill in missing values by linear interpolation
locf
Fill in missing values by carrying the last observed value forward
time_bucket_gapfill(
bucket_width INTERVAL | INTEGER,
time TIMESTAMPTZ | INTEGER,
[, timezone TEXT]
[, start TIMESTAMPTZ | INTEGER]
[, finish TIMESTAMPTZ | INTEGER]
) RETURNS TIMESTAMPTZ

Group data into buckets based on time interval, while filling in gaps of missing data. If you don't provide a gapfilling algorithm, such as locf or interpolate, gaps are left as NULL in the returned data.

Required arguments
NameTypeDescription
bucket_widthINTERVAL, INTEGERA PostgreSQL time interval to specify the length of each bucket. For example, use 1 day to get daily buckets. Use INTEGER only if your time column is integer-based.
timeTIMESTAMPTZ, INTEGERThe timestamp on which to base the bucket
Optional arguments
NameTypeDescription
timezoneTEXTThe timezone to use for bucketing. For example, Europe/Berlin. Available in TimescaleDB 2.9 or later. Does not work for integer-based time. If you have an untyped start or finish argument and a timezone argument, you might run into a problem where you are not passing your arguments for the parameter that you expect. To solve this, either name your arguments or explicitly type cast them.
startTIMESTAMPTZ, INTEGERThe start of the period to gapfill. Values before start are passed through, but no gapfilling is performed. Use INTEGER only if your time column is integer-based. Best practice is to use the WHERE clause. Specifying start is legacy. The WHERE is more performant, because the query planner can filter out chunks by constraint exclusion.
finishTIMESTAMPTZ, INTEGERThe end of the period to gapfill. Values after finish are passed through, but no gapfilling is performed. Use INTEGER only if your time column is integer-based. Best practice is to use the WHERE clause. Specifying finish is legacy. The WHERE is more performant, because the query planner can filter out chunks by constraint exclusion.
Returns
ColumnTypeDescription
time_bucket_gapfillTIMESTAMPTZThe start time of the time bucket.
interpolate(
value SMALLINT | INTEGER | BIGINT | REAL | DOUBLE PRECISION
[, prev EXPRESSION]
[, next EXPRESSION]
) RETURNS SMALLINT | INTEGER | BIGINT | REAL | DOUBLE PRECISION

Fill in missing values by linear interpolation. Use in the same query as time_bucket_gapfill. interpolate cannot be nested inside another function call.

Required arguments
NameTypeDescription
valueSMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISIONThe value to interpolate
Optional arguments
NameTypeDescription
prevEXPRESSIONIf no previous value is available for gapfilling, use the prev lookup expression to get a previous value. For example, you can use prev to fill in the first bucket in a queried time range. The expression must return a (time, value) tuple with types corresponding to the bucketed times and values.
nextEXPRESSIONIf no next value is available for gapfilling, use the next lookup expression to get a next value. For example, you can use next to fill in the last bucket in a queried time range. The expression must return a (time, value) tuple with types corresponding to the bucketed times and values.
Returns
ColumnTypeDescription
interpolateSMALLINT, INTEGER, BIGINT, REAL, DOUBLE PRECISIONThe gapfilled value. The return type is the type of value.
locf(
value ANY ELEMENT
[, prev EXPRESSION]
[, treat_null_as_missing BOOLEAN]
) RETURNS ANY ELEMENT

Fill in missing values by carrying the last observed value forward. Use in the same query as time_bucket_gapfill. locf cannot be nested inside another function call.

Required arguments
NameTypeDescription
valueANY ELEMENTThe value to carry forward
Optional arguments
NameTypeDescription
prevEXPRESSIONIf no previous value is available for gapfilling, use the prev lookup expression to get a previous value. For example, you can use prev to fill in the first bucket in a queried time range. The expression must return just a value (not a tuple as expected by the interpolate function) with the same type as the value parameter.
treat_null_as_missingBOOLEANWhen true, NULL values are ignored, and only non-NULL values are carried forward.
Returns
ColumnTypeDescription
locfANY ELEMENTThe gapfilled value. The return type is the type of value.

Get the daily average metric value. Use time_bucket_gapfill without specifying a gapfilling algorithm. This leaves the missing values as NULL:

SELECT time_bucket_gapfill('1 day', time) AS day,
avg(value) as value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
AND time < '2022-01-10 00:00:00-00'::timestamptz
GROUP BY day
ORDER BY day desc;
day | value
-----------------------+--------------------
2022-01-09 00:00:00+00 |
2022-01-08 00:00:00+00 | 48.61293155993108
2022-01-07 00:00:00+00 | 54.388267525986485
2022-01-06 00:00:00+00 |
2022-01-05 00:00:00+00 | 58.257520634785266
2022-01-04 00:00:00+00 | 46.09172424261765
2022-01-03 00:00:00+00 | 42.53498707820027
2022-01-02 00:00:00+00 |
2022-01-01 00:00:00+00 | 47.84420001415975
2021-12-31 00:00:00+00 |
(10 rows)

Get the daily average metric value. Use locf to carry the last value forward if a value is missing. Note that avg is nested inside locf, and not the other way around.

SELECT time_bucket_gapfill('1 day', time) AS day,
locf(avg(value)) as value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
AND time < '2022-01-10 00:00:00-00'::timestamptz
GROUP BY day
ORDER BY day desc;
day | value
-----------------------+--------------------
2022-01-09 00:00:00+00 | 48.61293155993108
2022-01-08 00:00:00+00 | 48.61293155993108
2022-01-07 00:00:00+00 | 54.388267525986485
2022-01-06 00:00:00+00 | 58.257520634785266
2022-01-05 00:00:00+00 | 58.257520634785266
2022-01-04 00:00:00+00 | 46.09172424261765
2022-01-03 00:00:00+00 | 42.53498707820027
2022-01-02 00:00:00+00 | 47.84420001415975
2022-01-01 00:00:00+00 | 47.84420001415975
2021-12-31 00:00:00+00 |
(10 rows)

Get the daily average metric value. Use the optional prev argument to locf to fill gaps at the beginning of the queried time range. Note that the prev expression returns just a value to fill the gap with. This is sufficient since the value is just carried forward and not further processed.

SELECT time_bucket_gapfill('1 day', time) AS day,
locf(
avg(value),
(
SELECT value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
ORDER BY time ASC
LIMIT 1
)
) as value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
AND time < '2022-01-10 00:00:00-00'::timestamptz
GROUP BY day
ORDER BY day desc;
day | value
-----------------------+--------------------
2022-01-09 00:00:00+00 | 48.61293155993108
2022-01-08 00:00:00+00 | 48.61293155993108
2022-01-07 00:00:00+00 | 54.388267525986485
2022-01-06 00:00:00+00 | 58.257520634785266
2022-01-05 00:00:00+00 | 58.257520634785266
2022-01-04 00:00:00+00 | 46.09172424261765
2022-01-03 00:00:00+00 | 42.53498707820027
2022-01-02 00:00:00+00 | 47.84420001415975
2022-01-01 00:00:00+00 | 47.84420001415975
2021-12-31 00:00:00+00 | 47.84420001415975
(10 rows)

Get the daily average metric value. Use interpolate to linearly interpolate the value if it is missing. Note that avg is nested inside interpolate.

SELECT time_bucket_gapfill('1 day', time) AS day,
interpolate(avg(value)) as value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
AND time < '2022-01-10 00:00:00-00'::timestamptz
GROUP BY day
ORDER BY day desc;
day | value
-----------------------+--------------------
2022-01-09 00:00:00+00 |
2022-01-08 00:00:00+00 | 48.61293155993108
2022-01-07 00:00:00+00 | 54.388267525986485
2022-01-06 00:00:00+00 | 56.32289408038588
2022-01-05 00:00:00+00 | 58.257520634785266
2022-01-04 00:00:00+00 | 46.09172424261765
2022-01-03 00:00:00+00 | 42.53498707820027
2022-01-02 00:00:00+00 | 45.189593546180014
2022-01-01 00:00:00+00 | 47.84420001415975
2021-12-31 00:00:00+00 |
(10 rows)

Get the daily average metric value. Use the optional prev and next arguments to interpolate to extrapolate the missing values starting and ending the queried time range. Note that the prev and next expressions each return a tuple with time and value. The time is necessary to compute the missing values correctly.

SELECT time_bucket_gapfill('1 day', time) AS day,
interpolate(
avg(value),
(
SELECT (time, value)
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
ORDER BY time ASC
LIMIT 1
),
(
SELECT (time, value)
FROM metrics
WHERE time < '2021-12-10 00:00:00-00'::timestamptz
ORDER BY time DESC
LIMIT 1
)
) as value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
AND time < '2022-01-10 00:00:00-00'::timestamptz
GROUP BY day
ORDER BY day desc;
day | value
-----------------------+--------------------
2022-01-09 00:00:00+00 | 48.61293155993108
2022-01-08 00:00:00+00 | 48.61293155993108
2022-01-07 00:00:00+00 | 54.388267525986485
2022-01-06 00:00:00+00 | 56.32289408038588
2022-01-05 00:00:00+00 | 58.257520634785266
2022-01-04 00:00:00+00 | 46.09172424261765
2022-01-03 00:00:00+00 | 42.53498707820027
2022-01-02 00:00:00+00 | 45.189593546180014
2022-01-01 00:00:00+00 | 47.84420001415975
2021-12-31 00:00:00+00 | 47.84420001415975
(10 rows)

Get the daily average metric value, using Europe/Berlin as the timezone. Note that daily time buckets now start at 23:00 UTC, which is equivalent to midnight in Berlin for the selected dates:

SELECT time_bucket_gapfill('1 day', time, 'Europe/Berlin') AS day,
interpolate(avg(value)) as value
FROM metrics
WHERE time > '2021-12-31 00:00:00+00'::timestamptz
AND time < '2022-01-10 00:00:00-00'::timestamptz
GROUP BY day
ORDER BY day desc;
day | value
-----------------------+--------------------
2022-01-09 23:00:00+00 |
2022-01-08 23:00:00+00 | 48.65079127913703
2022-01-07 23:00:00+00 | 47.31847777099154
2022-01-06 23:00:00+00 | 55.98845740343859
2022-01-05 23:00:00+00 | 55.61667401777108
2022-01-04 23:00:00+00 | 58.74115574522012
2022-01-03 23:00:00+00 | 45.77993635988273
2022-01-02 23:00:00+00 | 41.78689923453202
2022-01-01 23:00:00+00 | 24.324313477743974
2021-12-31 23:00:00+00 | 48.86680377661261
2021-12-30 23:00:00+00 |
(11 rows)

Keywords

Found an issue on this page?Report an issue or Edit this page in GitHub.