A solid example will explain histogram_quantile
well.
Assumptions:
- ONLY ONE series for simplicity
- 10 buckets for metric
http_request_duration_seconds
.
10ms, 50ms, 100ms, 200ms, 300ms, 500ms, 1s, 2s, 3s, 5s
http_request_duration_seconds
is a metric type of COUNTER
time |
value |
delta |
rate (quantity of items) |
t-10m |
50 |
N/A |
N/A |
t-5m |
100 |
50 |
50 / (5*60) |
t |
200 |
100 |
100 / (5*60) |
... |
... |
... |
... |
- We have at least two scrapes of the series covering 5 minutes for
rate()
to calculate the quantity
for each bucket
rate_xxx(t) = (value_xxx[t]-value_xxx[t-5m]) / (5m*60)
is the quantity of items
for [t-5m, t]
- We are looking at 2 samples(
value(t)
and value(t-5m)
) here.
10000
http request durations (items
) were recorded, that is,
10000 = rate_10ms(t) + rate_50ms(t) + rate_100ms(t) + ... + rate_5s(t)
.
bucket(le) |
10ms |
50ms |
100ms |
200ms |
300ms |
500ms |
1s |
2s |
3s |
5s |
+Inf |
range |
~10ms |
10~50ms |
50~100ms |
100~200ms |
200~300ms |
300~500ms |
500ms~1s |
1~2s |
2s~3s |
3~5s |
5s~ |
rate_xxx(t) |
3000 |
3000 |
1500 |
1000 |
800 |
400 |
200 |
40 |
30 |
5 |
5 |
Bucket is the essence of histogram. We just need 10 numbers in rate_xxx(t)
to do the quantile calculation
Let's take a close look at this expression (aggregation like sum()
is omitted for simplicity)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
We are actually looking for the 95%th
item in rate_xxx(t)
from bucket=10ms
to bucket=+Inf
. And 95%th
means 9500th
here since we got 10000
items in total (10000 * 0.95
).
From the table above, there are 9300 = 3000+3000+1500+1000+800
items together before bucket=500ms
.
So the 9500th
item is the 200th
item (9500-9300
) in bucket=500ms
(range=300~500ms
) which got 400
items within
And Prometheus assumes that items in a bucket spread evenly in a linear pattern.
The metric value for the 200th
item in bucket=500ms
is 400ms = 300+(500-300)*(200/400)
That is, 95%
is 400ms
.
There are a few to bear in mind
- Metric should be
COUNTER
in nature for histogram metric type
- Series for quantile calculation should always get label
le
defined
- Items (Data) in a specific bucket spread evenly a linear pattern (e.g.: 300~500ms)
Prometheus makes this assumption at least
- Quantile calculation requires buckets being sorted(defined) in some ascending/descending order (e.g.: 1ms < 5ms < 10ms < ...)
- Result of
histogram_quantile
is an approximation
P.S.:
The metric value is not always accurate
due to the assumption of Items (Data) in a specific bucket spread evenly a linear pattern
Say, the max duration in reality (e.g.: from nginx access log) in bucket=500ms
(range=300~500ms
) is 310ms
, however, we will get 400ms
from histogram_quantile
via above setup which is quite confusing sometimes.
The smaller bucket distance is, the more accurate approximation
is.
So setup the bucket distances that fit your needs.
rate[t] = (200-100)/(10*60)
Why the 10 in 10*60, shouldn't it be 5? – Indubitability