Introduction

Count the number of times a value appears in a column, using the probabilistic count-min sketch data structure and its associated algorithms. For applications where a small error rate is tolerable, this can result in huge savings in both CPU time and memory, especially for large datasets.

Related hyperfunction groups
Warning

This function group includes some experimental functions. Experimental functions might change or be removed in future releases. We do not recommend using them in production. Experimental functions are marked with an Experimental tag.

Aggregate

count_min_sketch
ExperimentalAggregate data into a CountMinSketch for approximate counting

Accessor

approx_count
ExperimentalEstimate the number of times a value appears from a CountMinSketch
count_min_sketch(
values TEXT,
error DOUBLE PRECISION,
probability DOUBLE PRECISION,
) RETURNS CountMinSketch

Aggregate data into a CountMinSketch object, which you can use to estimate the number of times a given item appears in a column. The sketch produces a biased estimator of frequency. It might overestimate the item count, but it can't underestimate.

You can control the relative error and the probability that the estimate falls outside the error bounds.

Required arguments
NameTypeDescription
valuesTEXTThe column of values to count
errorDOUBLE PRECISIONError tolerance in estimate, calculated relative to the number of values added to the sketch
probabilityDOUBLE PRECISIONProbability that an estimate falls outside the error bounds
Returns
ColumnTypeDescription
count_min_sketchCountMinSketchAn object storing a table of counters
approx_count (
item TEXT,
agg CountMinSketch
) RETURNS INTEGER

Estimate the number of times a given text value appears in a column.

Required arguments
NameTypeDescription
itemTEXTThe value you want to estimate occurrences of
aggCountMinSketchA CountMinSketch object created using count_min_sketch
Returns
ColumnTypeDescription
approx_countINTEGERThe estimated number of times item appeared in the sketch
Examples

Given a table of stock data, estimate how many times the symbol AAPL appears:

WITH t AS (
SELECT toolkit_experimental.count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
FROM stocks_real_time
)
SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)
FROM t;

Keywords

Found an issue on this page?Report an issue or Edit this page in GitHub.