In general, percentiles are useful for understanding the distribution of data. The 50th percentile is the point at which half of your data is greater and half is lesser. The 10th percentile is the point at which 90% of the data is greater, and 10% is lesser. The 99th percentile is the point at which 1% is greater, and 99% is lesser.
The 50th percentile, or median, is often a more useful measure than the average, especially when your data contains outliers. Outliers can dramatically change the average, but do not affect the median as much. For example, if you have three rooms in your house and two of them are 40℉ (4℃) and one is 130℉ (54℃), the average room temperature is 70℉ (21℃), which doesn't tell you much. However, the 50th percentile temperature is 40℉ (4℃), which tells you that at least half your rooms are at refrigerator temperatures (also, you should probably get your heating checked!)
Percentiles are sometimes avoided because calculating them requires more CPU and
memory than an average or other aggregate measures. This is because an exact
computation of the percentile needs the full dataset as an ordered list.
Timescale uses approximation algorithms to calculate a percentile without
requiring all of the data. This also makes them more compatible with continuous
aggregates. By default, TimescaleDB uses
uddsketch, but you can also choose to
tdigest. For more information about these algorithms, see the
advanced aggregation methods documentation.
Found an issue on this page?Report an issue!