approx_count()

Returns an estimate of the number of times that the text item was seen by the Count-Min Sketch agg.

approx_count (
    item TEXT,
    agg CountMinSketch
) RETURNS INTEGER

warning

Experimental features could have bugs. They might not be backwards compatible, and could be removed in future releases. Use these features at your own risk, and do not use any experimental features in production.

Required arguments

NameTypeDescription
itemTEXTThe text whose frequency you want to estimate
aggCountMinSketchThe aggregate to use for estimating the frequency of item

Returns

ColumnTypeDescription
approx_countINTEGERThe estimated number of times item was seen by the sketch

Sample usage

Create a Count-Min Sketch of the stock symbols seen in your tick data. With this aggregate, you'll then be able to estimate how often any text value appears in the symbol column.

SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
FROM stocks_real_time;

In this example, the first 0.01 dictates that your frequency estimates have a relative error of 1%. A relative error of 1% means that the approximate count of an item is overestimated by at most 1% of the total number of (non-NULL) rows in the column you aggregated. (The Count-Min Sketch is a biased estimator of the true frequency because it may overestimate the count of an item, but it cannot underestimate that count.)

The second 0.01 means that your estimated frequency falls outside those error bounds 1% of the time (on average).

You can then pass this aggregate into the approx_count function. Doing so gives you an estimate of how many times a given symbol appears in the symbol column.

For example, if you wanted to know approximately how many of the quotes in the tick data were for the AAPL stock, you would then do the following:

WITH t AS (
  SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketch
  FROM stocks_real_time
)
SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)
FROM t;

Found an issue on this page?

Report an issue!

Related Content