time_bucket_gapfill() functions
CommunityCommunity functions are available under Timescale Community Edition. Click to learn more.Introduction
Aggregate data by time interval, while filling in gaps of missing data.
time_bucket_gapfill
works similarly to time_bucket
, but adds
gapfilling capabilities. The other functions in this group must be used in the
same query as time_bucket_gapfill
. They control how missing values are treated.
Important
time_bucket_gapfill
must be used as a top-level expression in a query or subquery. You cannot, for example, nest time_bucket_gapfill
in another function (such as round(time_bucket_gapfill(...))
), or cast the result of the gapfilling call. If you need to cast, you can use time_bucket_gapfill
in a subquery, and let the outer query do the type cast.
Bucket
- time_bucket_gapfill
- Bucket rows by time interval while filling gaps in data
Interpolator
- interpolate
- Fill in missing values by linear interpolation
- locf
- Fill in missing values by carrying the last observed value forward
time_bucket_gapfill(bucket_width INTERVAL | INTEGER,time TIMESTAMPTZ | INTEGER,[, timezone TEXT][, start TIMESTAMPTZ | INTEGER][, finish TIMESTAMPTZ | INTEGER]) RETURNS TIMESTAMPTZ
Group data into buckets based on time interval, while filling in gaps of missing data. If you don't provide a gapfilling algorithm, such as locf
or interpolate
, gaps are left as NULL
in the returned data.
Required arguments
Name | Type | Description |
---|---|---|
bucket_width | INTERVAL , INTEGER | A PostgreSQL time interval to specify the length of each bucket. For example, use 1 day to get daily buckets. Use INTEGER only if your time column is integer-based. |
time | TIMESTAMPTZ , INTEGER | The timestamp on which to base the bucket |
Optional arguments
Name | Type | Description |
---|---|---|
timezone | TEXT | The timezone to use for bucketing. For example, Europe/Berlin . Available in TimescaleDB 2.9 or later. Does not work for integer-based time. If you have an untyped start or finish argument and a timezone argument, you might run into a problem where you are not passing your arguments for the parameter that you expect. To solve this, either name your arguments or explicitly type cast them. |
start | TIMESTAMPTZ , INTEGER | The start of the period to gapfill. Values before start are passed through, but no gapfilling is performed. Use INTEGER only if your time column is integer-based. Best practice is to use the WHERE clause. Specifying start is legacy. The WHERE is more performant, because the query planner can filter out chunks by constraint exclusion. |
finish | TIMESTAMPTZ , INTEGER | The end of the period to gapfill. Values after finish are passed through, but no gapfilling is performed. Use INTEGER only if your time column is integer-based. Best practice is to use the WHERE clause. Specifying finish is legacy. The WHERE is more performant, because the query planner can filter out chunks by constraint exclusion. |
Returns
Column | Type | Description |
---|---|---|
time_bucket_gapfill | TIMESTAMPTZ | The start time of the time bucket. |
interpolate(value SMALLINT | INTEGER | BIGINT | REAL | DOUBLE PRECISION[, prev EXPRESSION][, next EXPRESSION]) RETURNS SMALLINT | INTEGER | BIGINT | REAL | DOUBLE PRECISION
Fill in missing values by linear interpolation. Use in the same query as time_bucket_gapfill
. interpolate
cannot be nested inside another function call.
Required arguments
Name | Type | Description |
---|---|---|
value | SMALLINT , INTEGER , BIGINT , REAL , DOUBLE PRECISION | The value to interpolate |
Optional arguments
Name | Type | Description |
---|---|---|
prev | EXPRESSION | If no previous value is available for gapfilling, use the prev lookup expression to get a previous value. For example, you can use prev to fill in the first bucket in a queried time range. The expression must return a (time, value) tuple with types corresponding to the bucketed times and values. |
next | EXPRESSION | If no next value is available for gapfilling, use the next lookup expression to get a next value. For example, you can use next to fill in the last bucket in a queried time range. The expression must return a (time, value) tuple with types corresponding to the bucketed times and values. |
Returns
Column | Type | Description |
---|---|---|
interpolate | SMALLINT , INTEGER , BIGINT , REAL , DOUBLE PRECISION | The gapfilled value. The return type is the type of value . |
locf(value ANY ELEMENT[, prev EXPRESSION][, treat_null_as_missing BOOLEAN]) RETURNS ANY ELEMENT
Fill in missing values by carrying the last observed value forward. Use in the same query as time_bucket_gapfill
. locf
cannot be nested inside another function call.
Required arguments
Name | Type | Description |
---|---|---|
value | ANY ELEMENT | The value to carry forward |
Optional arguments
Name | Type | Description |
---|---|---|
prev | EXPRESSION | If no previous value is available for gapfilling, use the prev lookup expression to get a previous value. For example, you can use prev to fill in the first bucket in a queried time range. The expression must return just a value (not a tuple as expected by the interpolate function) with the same type as the value parameter. |
treat_null_as_missing | BOOLEAN | When true , NULL values are ignored, and only non-NULL values are carried forward. |
Returns
Column | Type | Description |
---|---|---|
locf | ANY ELEMENT | The gapfilled value. The return type is the type of value . |
Get the daily average metric value. Use time_bucket_gapfill
without specifying
a gapfilling algorithm. This leaves the missing values as NULL
:
SELECT time_bucket_gapfill('1 day', time) AS day,avg(value) as valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzAND time < '2022-01-10 00:00:00-00'::timestamptzGROUP BY dayORDER BY day desc;
day | value-----------------------+--------------------2022-01-09 00:00:00+00 |2022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 |2022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 |2022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 |(10 rows)
Get the daily average metric value. Use locf
to carry the last value forward
if a value is missing. Note that avg
is nested inside locf
, and not the
other way around.
SELECT time_bucket_gapfill('1 day', time) AS day,locf(avg(value)) as valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzAND time < '2022-01-10 00:00:00-00'::timestamptzGROUP BY dayORDER BY day desc;
day | value-----------------------+--------------------2022-01-09 00:00:00+00 | 48.612931559931082022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 58.2575206347852662022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 47.844200014159752022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 |(10 rows)
Get the daily average metric value. Use the optional prev
argument to locf
to fill gaps at the beginning of the queried time range. Note that the
prev
expression returns just a value to fill the gap with. This is sufficient since the value is just carried forward and not further processed.
SELECT time_bucket_gapfill('1 day', time) AS day,locf(avg(value),(SELECT valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzORDER BY time ASCLIMIT 1)) as valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzAND time < '2022-01-10 00:00:00-00'::timestamptzGROUP BY dayORDER BY day desc;
day | value-----------------------+--------------------2022-01-09 00:00:00+00 | 48.612931559931082022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 58.2575206347852662022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 47.844200014159752022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 | 47.84420001415975(10 rows)
Get the daily average metric value. Use interpolate
to linearly interpolate
the value if it is missing. Note that avg
is nested inside interpolate
.
SELECT time_bucket_gapfill('1 day', time) AS day,interpolate(avg(value)) as valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzAND time < '2022-01-10 00:00:00-00'::timestamptzGROUP BY dayORDER BY day desc;
day | value-----------------------+--------------------2022-01-09 00:00:00+00 |2022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 56.322894080385882022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 45.1895935461800142022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 |(10 rows)
Get the daily average metric value. Use the optional prev
and next
arguments to interpolate
to extrapolate the missing values starting and ending the queried time range. Note that the prev
and
next
expressions each return a tuple with time and value. The time is necessary to compute the missing values correctly.
SELECT time_bucket_gapfill('1 day', time) AS day,interpolate(avg(value),(SELECT (time, value)FROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzORDER BY time ASCLIMIT 1),(SELECT (time, value)FROM metricsWHERE time < '2021-12-10 00:00:00-00'::timestamptzORDER BY time DESCLIMIT 1)) as valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzAND time < '2022-01-10 00:00:00-00'::timestamptzGROUP BY dayORDER BY day desc;
day | value-----------------------+--------------------2022-01-09 00:00:00+00 | 48.612931559931082022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 56.322894080385882022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 45.1895935461800142022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 | 47.84420001415975(10 rows)
Get the daily average metric value, using Europe/Berlin
as the timezone. Note
that daily time buckets now start at 23:00 UTC
, which is equivalent to
midnight in Berlin for the selected dates:
SELECT time_bucket_gapfill('1 day', time, 'Europe/Berlin') AS day,interpolate(avg(value)) as valueFROM metricsWHERE time > '2021-12-31 00:00:00+00'::timestamptzAND time < '2022-01-10 00:00:00-00'::timestamptzGROUP BY dayORDER BY day desc;
day | value-----------------------+--------------------2022-01-09 23:00:00+00 |2022-01-08 23:00:00+00 | 48.650791279137032022-01-07 23:00:00+00 | 47.318477770991542022-01-06 23:00:00+00 | 55.988457403438592022-01-05 23:00:00+00 | 55.616674017771082022-01-04 23:00:00+00 | 58.741155745220122022-01-03 23:00:00+00 | 45.779936359882732022-01-02 23:00:00+00 | 41.786899234532022022-01-01 23:00:00+00 | 24.3243134777439742021-12-31 23:00:00+00 | 48.866803776612612021-12-30 23:00:00+00 |(11 rows)
Keywords
Found an issue on this page?Report an issue or Edit this page in GitHub.