# count_min_sketch() ToolkitTimescaleDB Toolkit functions are available under Timescale Community Edition. They are automatically included with Timescale Cloud. Click to learn more.ExperimentalExperimental TimescaleDB Toolkit functions are not suitable for production environments. They may have bugs and may cause data loss. Click to learn more.

Produces a Count-Min Sketch in the form of an aggregate that can be passed to the `approx_count`

function to estimate how many times a particular value has appeared in a column.

count_min_sketch(values TEXT,error DOUBLE PRECISION,probability DOUBLE PRECISION,) RETURNS CountMinSketch

warning

Experimental features could have bugs. They might not be backwards compatible, and could be removed in future releases. Use these features at your own risk, and do not use any experimental features in production.

## Required arguments

Name | Type | Description |
---|---|---|

`values` | `TEXT` | Column to aggregate |

`error` | `DOUBLE PRECISION` | Error tolerance in estimate, calculated relative to the number of values added to the sketch |

`probability` | `DOUBLE PRECISION` | Probability that an estimate falls outside the error bounds |

## Returns

Column | Type | Description |
---|---|---|

`agg` | `CountMinSketch` | An object storing a table of counters. |

## Sample usage

Create a Count-Min Sketch of the stock symbols seen in your tick data.
With this aggregate, you'll then be able to estimate how often any text value appears in the `symbol`

column.

SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketchFROM stocks_real_time;

In this example, the first `0.01`

dictates that your frequency estimates have a relative error of 1%.
A relative error of 1% means that the approximate count of an item is overestimated by at most 1% of the total number of (non-`NULL`

) rows in the column you aggregated.
(The Count-Min Sketch is a biased estimator of the true frequency because it may overestimate the count of an item, but it cannot underestimate that count.)

The second `0.01`

means that your estimated frequency falls outside those error bounds 1% of the time (on average).

You can then pass this aggregate into the `approx_count`

function.
Doing so gives you an estimate of how many times a given symbol appears in the `symbol`

column.

For example, if you wanted to know approximately how many of the quotes in the tick data were for the `AAPL`

stock, you would then do the following:

WITH t AS (SELECT count_min_sketch(symbol, 0.01, 0.01) AS symbol_sketchFROM stocks_real_time)SELECT toolkit_experimental.approx_count('AAPL', symbol_sketch)FROM t;

Found an issue on this page?

Report an issue!