# freq_agg() and mcv_agg() functions

ToolkitTimescaleDB Toolkit functions are available under Timescale Community Edition. They are automatically included with Timescale, but must be installed separately for self-hosted TimescaleDB. Click to learn more.## Introduction

Get the most common elements of a set and their relative frequency. The estimation uses the SpaceSaving algorithm.

This group of functions contains two aggregate functions, which let you set the
cutoff for keeping track of a value in different ways. `freq_agg`

allows you to specify a minimum frequency, and `mcv_agg`

allows
you to specify the target number of values to keep.

To estimate the absolute number of times a value appears, use `count_min_sketch`

.

###### Related hyperfunction groups

###### Warning

This function group includes some experimental functions. Experimental functions might change or be removed in future releases. We do not recommend using them in production. Experimental functions are marked with an **Experimental** tag.

### Aggregate

- freq_agg
- ExperimentalAggregate data into a space-saving aggregate for further frequency analysis

### Alternate aggregate

- mcv_agg
- Aggregate data into a space-saving aggregate for further calculation of most-frequent values

### Accessor

- into_values
- Get a table of all frequency estimates from a space-saving aggregate
- max_frequency
- Get the maximum bound of the estimated frequency for a given value in a space-saving aggregate
- min_frequency
- Get the minimum bound of the estimated frequency for a given value in a space-saving aggregate
- topn
- Get the top N most common values from a space-saving aggregate

### Rollup

- rollup
- Combine multiple frequency aggregates

freq_agg(min_freq DOUBLE PRECISION,value AnyElement) RETURNS SpaceSavingAggregate

Aggregate data into a space-saving aggregate object, which stores frequency information in an intermediate form. You can then use any of the accessors in this group to return estimated frequencies or the most common elements.

## Required arguments

Name | Type | Description |
---|---|---|

`min_freq` | `DOUBLE PRECISION` | Frequency cutoff for keeping track of a value. Values that occur less frequently than the cutoff are not stored. |

`value` | `AnyElement` | The column to store frequencies for |

## Returns

Column | Type | Description |
---|---|---|

`agg` | `SpaceSavingAggregate` | An object storing the most common elements of the given table and their estimated frequency. You can pass this object to any of the accessor functions to get a final result. |

## Examples

Create a space-saving aggregate over a field `ZIP`

in a `HomeSales`

table. This aggregate tracks any `ZIP`

value that occurs in at least 5% of rows:

SELECT toolkit_experimental.freq_agg(0.05, ZIP) FROM HomeSales;

mcv_agg (n INTEGER,value AnyElement[, skew DOUBLE PRECISION]) RETURNS SpaceSavingAggregate

Aggregate data into a space-saving aggregate, which stores frequency information in an intermediate form.
You can then use any of the accessors in this group to return estimated frequencies or the most common elements.

This differs from `freq_agg`

in that you can specify a target number of values to keep, rather than a frequency cutoff.

## Required arguments

Name | Type | Description |
---|---|---|

`n` | `INTEGER` | The target number of most-frequent values |

`value` | `AnyElement` | The column to store frequencies for |

## Optional arguments

Name | Type | Description |
---|---|---|

`skew` | `DOUBLE PRECISION` | The estimated skew of the data, defined as the `s` parameter of a zeta distribution. Must be greater than `1.0` . Defaults to `1.1` . For more information, see the section on skew. |

## Returns

Column | Type | Description |
---|---|---|

`agg` | `SpaceSavingAggregate` | An object storing the most common elements of the given table and their estimated frequency. You can pass this object to any of the accessor functions to get a final result. |

## Examples

Create a topN aggregate over the `country`

column of the `users`

table. Targets the top 10 most-frequent values:

SELECT mcv_agg(10, country) FROM users;

Create a topN aggregate over the `type`

column of the `devices`

table. Estimates the skew of the data to be 1.05, and targets the 5 most-frequent values:

SELECT mcv_agg(5, 1.05, type) FROM devices;

into_values(agg SpaceSavingAggregate) RETURNS (AnyElement, DOUBLE PRECISION, DOUBLE PRECISION)

Returns the data from a space-saving aggregate as a table. The table lists the stored values with the minimum and maximum bounds for their estimated frequencies.

## Required arguments

## Returns

Column | Type | Description |
---|---|---|

`value` | `AnyElement` | A commonly seen value in the original dataset |

`min_freq` | `DOUBLE PRECISION` | The minimum bound for the estimated frequency |

`max_freq` | `DOUBLE PRECISION` | The maximum bound for the estimated frequency |

max_frequency (agg SpaceSavingAggregate,value AnyElement) RETURNS DOUBLE PRECISION

Get the maximum bound of the estimated frequency for a given value in a space-saving aggregate

## Required arguments

## Returns

Column | Type | Description |
---|---|---|

`max_frequency` | `DOUBLE PRECISION` | The maximum bound for the value's estimated frequency. The maximum frequency might be 0 if the value's frequency falls below the space-saving aggregate's cut-off threshold. For more information, see `freq_agg` . |

## Examples

Find the maximum frequency of the value `3`

in a column named `value`

within the table `value_test`

:

SELECT max_frequency((SELECT mcv_agg(20, value) FROM value_test),3);

min_frequency (agg SpaceSavingAggregate,value AnyElement) RETURNS DOUBLE PRECISION

Get the minimum bound of the estimated frequency for a given value in a space-saving aggregate

## Required arguments

## Returns

Column | Type | Description |
---|---|---|

`min_frequency` | `DOUBLE PRECISION` | The minimum bound for the value's estimated frequency. The minimum frequency might be 0 if the value's frequency falls below the space-saving aggregate's cut-off threshold. For more information, see `freq_agg` . |

## Examples

Find the minimum frequency of the value `3`

in a column named `value`

within the table `value_test`

:

SELECT min_frequency((SELECT mcv_agg(20, value) FROM value_test),3);

topn (agg SpaceSavingAggregate,n INTEGER) RETURNS AnyElement

Get the top N most common values from a space-saving aggregate. The space-saving aggregate can be created from either `freq_agg`

or `mcv_agg`

.

## Required arguments

Name | Type | Description |
---|---|---|

`agg` | `SpacingsavingAggregate` | A space-saving aggregate created using either `freq_agg` or `mcv_agg` |

`n` | `INTEGER` | The number of values to return. Required only for frequency aggregates. For top N aggregates, defaults to target N of the aggregate itself, and requests for a higher N return an error. In some cases, the function might return fewer than N values. This might happen if a frequency aggregate doesn't contain N values above the minimum frequency, or if the data isn't skewed enough to support N values from a top N aggregate. |

## Returns

Column | Type | Description |
---|---|---|

`topn` | `AnyElement` | The N most-frequent values in the aggregate |

## Examples

Get the 20 most frequent `zip_codes`

from an `employees`

table:

SELECT topn(mcv_agg(20, zip_code)) FROM employees;

rollup(agg SpaceSavingAggregate) RETURNS SpaceSavingAggregate

This will combine multiple aggregates created with `freq_agg`

or
`mcv_agg`

functions. This function does require that the source
aggregates have been created with the same parameters (same `min_freq`

for `freq_agg`

, same n-factor and `skew`

, if used, for a `mcv_agg`

).

This will produce a very similar aggregate to running the same
aggregate function over all the source data. In most cases, any
difference will be no more than what you might get from simply reordering
the input. However, if the source data for the different aggregates is
very differently distributed, the rollup result may have looser frequency
bounds.

## Required arguments

Name | Type | Description |
---|---|---|

`agg` | `SpaceSavingAggregate` | The aggregates to roll up. These must have been created with the same parameters. |

## Returns

Column | Type | Description |
---|---|---|

`rollup` | `SpaceSavingAggregate` | An aggregate containing the most common elements from all of the underlying data for all of the aggregates. |

This test uses a table of randomly generated data. The values used are the integer square roots of a random number in the range 0 to 400.

CREATE TABLE value_test(value INTEGER);INSERT INTO value_test SELECT floor(sqrt(random() * 400)) FROM generate_series(1,100000);

This returns the 5 most common values seen in the table:

SELECT topn(toolkit_experimental.freq_agg(0.05, value),5)FROM value_test;

The output for this query:

topn------1918171615

This test uses a table of randomly generated data. The values used are the integer square roots of a random number in the range (0,400).

CREATE TABLE value_test(value INTEGER);INSERT INTO value_test SELECT floor(sqrt(random() * 400)) FROM generate_series(1,100000);

Return values that represent more than 5% of the input:

SELECT value, min_freq, max_freqFROM into_values((SELECT toolkit_experimental.freq_agg(0.05, value) FROM value_test));

The output for this query looks like this, with some variation due to randomness:

value | min_freq | max_freq-------+----------+----------19 | 0.09815 | 0.0981518 | 0.09169 | 0.0916917 | 0.08804 | 0.0880416 | 0.08248 | 0.0824815 | 0.07703 | 0.0770314 | 0.07157 | 0.0715713 | 0.06746 | 0.0674612 | 0.06378 | 0.0637811 | 0.05565 | 0.0559510 | 0.05286 | 0.05289

You can adjust the following advanced options to suit your data distribution:

`mcv_agg`

assumes that the data is skewed. In other words, some values are more
frequent than others. The degree of skew is defined by the `s`

parameter of a
zeta distribution.

The default value of `1.1`

works on data with this distribution or a more
extreme one:

N | Minimum percentage of all values represented by the top N (approximate) |
---|---|

5 | 20% |

10 | 25% |

20 | 30% |

50 | 36% |

100 | 40% |

Keywords

Found an issue on this page?

Report an issue!