count_min_sketch() functions

ToolkitTimescaleDB Toolkit functions are available under Timescale Community Edition. They are automatically included with Timescale, but must be installed separately for self-hosted TimescaleDB. Click to learn more.

Count the number of times a value appears in a column, using the probabilistic count-min sketch data structure and its associated algorithms. For applications where a small error rate is tolerable, this can result in huge savings in both CPU time and memory, especially for large datasets.

Related hyperfunction groups

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

More efficient because multiple accessors can reuse the same aggregate
Easier to reason about performance, because aggregation is separate from final computation
Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the blog post on two-step aggregates.

Functions in this group

Warning

This function group includes some experimental functions. Experimental functions might change or be removed in future releases. We do not recommend using them in production. Experimental functions are marked with an Experimental tag.

count_min_sketch: ExperimentalAggregate data into a CountMinSketch for approximate counting

approx_count: ExperimentalEstimate the number of times a value appears from a CountMinSketch

Function details

count_min_sketch(
    values TEXT,
    error DOUBLE PRECISION,
    probability DOUBLE PRECISION,
) RETURNS CountMinSketch

Aggregate data into a CountMinSketch object, which you can use to estimate the number of times a given item appears in a column. The sketch produces a biased estimator of frequency. It might overestimate the item count, but it can't underestimate.

You can control the relative error and the probability that the estimate falls outside the error bounds.

Required arguments

Name	Type	Description
`values`	`TEXT`	The column of values to count
`error`	`DOUBLE PRECISION`	Error tolerance in estimate, calculated relative to the number of values added to the sketch
`probability`	`DOUBLE PRECISION`	Probability that an estimate falls outside the error bounds

Returns

Column	Type	Description
`count_min_sketch`	`CountMinSketch`	An object storing a table of counters

approx_count (
    item TEXT,
    agg CountMinSketch
) RETURNS INTEGER

Estimate the number of times a given text value appears in a column.

Required arguments

Name	Type	Description
`item`	`TEXT`	The value you want to estimate occurrences of
`agg`	`CountMinSketch`	A `CountMinSketch` object created using `count_min_sketch`

Returns

Column	Type	Description
`approx_count`	`INTEGER`	The estimated number of times `item` appeared in the sketch

Examples

Given a table of stock data, estimate how many times the symbol AAPL appears:


WITH t AS (
  SELECT toolkit_experimental.count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
  FROM stocks_real_time
)
SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)
FROM t;

count_min_sketch() functions

Related hyperfunction groups

Two-step aggregation

Functions in this group

Warning

Aggregate

Accessor

Function details

count_min_sketch()

approx_count()

count_min_sketch() functions

Introduction

Related hyperfunction groups

Two-step aggregation

Functions in this group

Warning

Aggregate

Accessor

Function details

count_min_sketch()

approx_count()

Related Content