Understanding increase() and rate() used on http_server_requests_seconds_count with prometheus and Grafana
Asked Answered
W

2

12

I tried to obtains these measurements from prometheus:

  1. increase(http_server_requests_seconds_count{uri="myURI"}[10s])
  2. increase(http_server_requests_seconds_count{uri="myURI"}[30s])
  3. rate(http_server_requests_seconds_count{uri="myURI"}[10s])
  4. rate(http_server_requests_seconds_count{uri="myURI"}[30s])

Then I run a python script where 5 threads are created, each of them hitting this myURI endpoint:

What I see on Grafana is:

enter image description here enter image description here

I received these values:

  1. 0
  2. 6
  3. 0
  4. 0.2

I expected to receive these (but didn't):

  1. 5 (as in the last 10 seconds this endpoint received 5 calls)
  2. 5 (as in the last 30 seconds this endpoint received 5 calls)
  3. 0.5 (the endpoint received 5 calls in 10 seconds 5/10)
  4. 0.167 (the endpoint received 5 calls in 30 seconds 5/30)

Can someone explain with my example the formula behind this function and a way to achieve the metrics/value I expect?

Weirick answered 24/1, 2022 at 14:52 Comment(4)
Does this answer your question? Do I understand Prometheus's rate vs increase functions correctly?Telekinesis
Can you try your tests again? It looks like you switched your #2 and #3 results/queries. And if there was some way that 10 requests happened that the would explain 3 of the 4 results.Histopathology
I did another test and now it looks stranger, value 1 and 3 are equal to 0 know. If I repeat the tests it gave me the same result.Weirick
Could you please also clarify what is the scrape interval in your Prometheus configuration?Pirbhai
F
12

Prometheus calculates increase(m[d]) at timestamp t in the following way:

  1. It fetches raw samples stored in the database for time series matching m on a time range (t-d .. t]. Note that samples at timestamp t-d aren't included in the time range, while samples at t are included. It is expected that every selected time series is a counter, since increase() works only with counters.
  2. It calculates the difference between the last and the first raw sample value on the selected time range individually per each time series matching m. Note that Prometheus doesn't take into account the difference between the last raw sample just before the (t-d ... t] time range and the first raw samples at this time range. This may lead to lower than expected results in some cases.
  3. It extrapolates results obtained at step 2 if the first and/or the last raw samples are located too far from time range boundaries (t-d .. t]. This may lead to unexpected results. For example, fractional results for integer counters. See this issue for details.

Prometheus calculates rate(m[d]) as increase(m[d]) / d, so rate() results may be also unexpected sometimes. Prometheus developers are aware of these issues and are going to fix them eventually - see these design docs.

In the meantime you can use VictoriaMetrics - this is Prometheus-like monitoring solution I work on. It provides increase() and rate() functions, which are free from issues mentioned above.

Faircloth answered 22/12, 2022 at 6:13 Comment(5)
Hi, thanks for the answer which, however, is not very clear to me. What raw samples are? What does it mean that it calculate the difference between samples? I could imagine that in prometheus there are 1 row for each call made with a timestamp. Why not simply count the number of row in a specific time window? Maybe a concrete example, with numbers, could make the formula you explained clearer to me.Weirick
Added a link to the answer explaining raw samples. See docs.victoriametrics.com/keyConcepts.html for more detailsFaircloth
And what does it mean "difference between raw samples"? The difference between the raw samples' values? What I don't understand it's why they do not simply count the raw samples. P.S. I do not know exactly what prometheus concretely store.Weirick
Updated the answer accordingly by adding links to the definition of time series and counters. I hope the answer is easier to understand now.Faircloth
Thanks a lot! It is clearer now.Weirick
H
-1

Increase measure the increase in your window, and rate is the 'per second' rate.

So if you were to imagine that Prometheus write a database row each second of what the current value is. That helps reason through the metrics calculations.

  1. If the total so far was 0 then the metric increased by 5 in the last 10 seconds, the increase is looking back 10 seconds and finding the value and reporting the increase.

  2. The same as 1. but it is looking back 30 seconds.

  3. The 0.5 comes from looking at the window size (10s), measuring the increase them normalizing it to a 'per second' rate. So the increase of 5/10 seconds: .5 calls per second

  4. With the larger window, the 'per second' is spread over the larger window.

increase is easier to reason about, but rate standardizes on the 'per second' unit.

One problem with the 'per second' unit though is that it can be too small to talk through, but it makes it easy to multiple by 60 (to show a per minute) or 3600 (to show a per hour). The window is only used to calculate the window for how far to gather data for the calculation.

Histopathology answered 24/1, 2022 at 21:57 Comment(3)
you didn't answer the question. You justified the values he would expect to get, but he wants to know why he didn't get them.Clemmieclemmons
Thanks @MarceloÁviladeOliveira! Yes your answer just re-explain what I have already said in my post. The point is why I got the values in the first list, instead of the values I expected.Weirick
Oh I see. I had misunderstood the question and missed the first list.Histopathology

© 2022 - 2025 — McMap. All rights reserved.