Understanding increase() and rate() used on http_server_requests_seconds_count with prometheus and Grafana

Asked 24/1, 2022 at 14:52 Answered 22/12, 2022 at 6:13

Solved spring prometheus grafana micrometer

I tried to obtains these measurements from prometheus:

increase(http_server_requests_seconds_count{uri="myURI"}[10s])
increase(http_server_requests_seconds_count{uri="myURI"}[30s])
rate(http_server_requests_seconds_count{uri="myURI"}[10s])
rate(http_server_requests_seconds_count{uri="myURI"}[30s])

Then I run a python script where 5 threads are created, each of them hitting this myURI endpoint:

What I see on Grafana is:

I received these values:

I expected to receive these (but didn't):

5 (as in the last 10 seconds this endpoint received 5 calls)
5 (as in the last 30 seconds this endpoint received 5 calls)
0.5 (the endpoint received 5 calls in 10 seconds 5/10)
0.167 (the endpoint received 5 calls in 30 seconds 5/30)

Can someone explain with my example the formula behind this function and a way to achieve the metrics/value I expect?

Weirick answered 24/1, 2022 at 14:52 Comment(4)

Does this answer your question? Do I understand Prometheus's rate vs increase functions correctly? – Telekinesis 24/1, 2022 at 22:4

Can you try your tests again? It looks like you switched your #2 and #3 results/queries. And if there was some way that 10 requests happened that the would explain 3 of the 4 results. – Histopathology 25/1, 2022 at 20:40

I did another test and now it looks stranger, value 1 and 3 are equal to 0 know. If I repeat the tests it gave me the same result. – Weirick 26/1, 2022 at 18:27

Could you please also clarify what is the scrape interval in your Prometheus configuration? – Pirbhai 16/2, 2022 at 17:0

Prometheus calculates increase(m[d]) at timestamp t in the following way:

It fetches raw samples stored in the database for time series matching m on a time range (t-d .. t]. Note that samples at timestamp t-d aren't included in the time range, while samples at t are included. It is expected that every selected time series is a counter, since increase() works only with counters.
It calculates the difference between the last and the first raw sample value on the selected time range individually per each time series matching m. Note that Prometheus doesn't take into account the difference between the last raw sample just before the (t-d ... t] time range and the first raw samples at this time range. This may lead to lower than expected results in some cases.
It extrapolates results obtained at step 2 if the first and/or the last raw samples are located too far from time range boundaries (t-d .. t]. This may lead to unexpected results. For example, fractional results for integer counters. See this issue for details.

Prometheus calculates rate(m[d]) as increase(m[d]) / d, so rate() results may be also unexpected sometimes. Prometheus developers are aware of these issues and are going to fix them eventually - see these design docs.

In the meantime you can use VictoriaMetrics - this is Prometheus-like monitoring solution I work on. It provides increase() and rate() functions, which are free from issues mentioned above.

Faircloth answered 22/12, 2022 at 6:13 Comment(5)

Hi, thanks for the answer which, however, is not very clear to me. What raw samples are? What does it mean that it calculate the difference between samples? I could imagine that in prometheus there are 1 row for each call made with a timestamp. Why not simply count the number of row in a specific time window? Maybe a concrete example, with numbers, could make the formula you explained clearer to me. – Weirick 1/4, 2023 at 10:1

Added a link to the answer explaining raw samples. See docs.victoriametrics.com/keyConcepts.html for more details – Faircloth 1/4, 2023 at 19:16

And what does it mean "difference between raw samples"? The difference between the raw samples' values? What I don't understand it's why they do not simply count the raw samples. P.S. I do not know exactly what prometheus concretely store. – Weirick 2/4, 2023 at 13:18

Updated the answer accordingly by adding links to the definition of time series and counters. I hope the answer is easier to understand now. – Faircloth 3/4, 2023 at 0:42

Thanks a lot! It is clearer now. – Weirick 3/4, 2023 at 12:14

-1

Increase measure the increase in your window, and rate is the 'per second' rate.

So if you were to imagine that Prometheus write a database row each second of what the current value is. That helps reason through the metrics calculations.

If the total so far was 0 then the metric increased by 5 in the last 10 seconds, the increase is looking back 10 seconds and finding the value and reporting the increase.
The same as 1. but it is looking back 30 seconds.
The 0.5 comes from looking at the window size (10s), measuring the increase them normalizing it to a 'per second' rate. So the increase of 5/10 seconds: .5 calls per second
With the larger window, the 'per second' is spread over the larger window.

increase is easier to reason about, but rate standardizes on the 'per second' unit.

One problem with the 'per second' unit though is that it can be too small to talk through, but it makes it easy to multiple by 60 (to show a per minute) or 3600 (to show a per hour). The window is only used to calculate the window for how far to gather data for the calculation.

Histopathology answered 24/1, 2022 at 21:57 Comment(3)

you didn't answer the question. You justified the values he would expect to get, but he wants to know why he didn't get them. – Clemmieclemmons 25/1, 2022 at 1:15

Thanks @MarceloÁviladeOliveira! Yes your answer just re-explain what I have already said in my post. The point is why I got the values in the first list, instead of the values I expected. – Weirick 25/1, 2022 at 10:24

Oh I see. I had misunderstood the question and missed the first list. – Histopathology 25/1, 2022 at 20:17

Recommended topics

Hot tags