Why does Rate is grafana seem to be looking at an interval that is a minute larger than requested?
Asked Answered
U

1

0

I am trying to follow up on this question that has a great explanation about counter/rate/increase work in prometheus/grafana ... but I still cannot reconcile it with my observations :/

Here is a plot of the raw counter overtime. Samples are every 15s, and there is a single increase at 5:55:00 from 7361 to 7370:

enter image description here

Now, this is the plot of rate(counter[2m]) over the same interval: enter image description here

It shows 4 rate samples of 0.15 between 05:55:00 and 05:55:45. 0.15/second is 9 per minute, but requested interval is two minutes not one.

Consequently, if I replace rate with increase in this query, it shows 18 rather than 9, which is twice what I expect.

Also, if I do rate(counter[1m]) (rather than 2m), then I get no data at all.

I ended up doing rate(counter[2m])*60) which seems to be showing proper increase value within a minute interval, but (1) this seems ... clumsy: why do I have to do this??? and (2) why can I not use less than 2m interval, if my samples are every 15s?

I have been pulling what's left of my hair out over this for way longer than I would like to. Any help would be much appreciated.

Unemployment answered 28/2, 2023 at 19:38 Comment(0)
B
1

If you query the underlying data (rather than a chart with a step of 15 seconds based on it; as in, query for counter[5m] as a table instead of counter as a graph) you will very likely find that the time series has 1 minute rather than 15 second resolution.

Which would explain why rate(counter[2m]) works but rate(counter[1m]) doesn't (since on average, it only has 1 sample to work with). And would also explain why (considering the fact that rate() extrapolates to the edges of the interval) you get about 2x the expected rate: Prometheus finds 2 samples one minute apart, with an increase of 1; and extrapolates that to an increase of 2 over 2 minutes. See this Prometheus issue for a whole lot more details (and for why I believe this is not ideal behavior).

Blocked answered 1/3, 2023 at 9:57 Comment(7)
Thanks for the response! I am not sure what specifically you mean by querying underlying data, but I did look at the Query Inspector in grafana, and samples are 15 seconds apart if the step is 15s, and I can't make them any closer (reducing the step further, they are still 15s apart). – Unemployment
But even if they were actually a minute apart, I still don't get it. rate(counter[2m]) would would get two samples: 7361, 7370, 7370, so it should be 9/120 = 0.075, why does it come back as 0.15? It looks like it is always "missing" exactly one minute: if I make it rate(counter[5m]) I get 9/4m, and rate(counter[1h]) returns 9/59m 🀷 – Unemployment
When you look at the Grafana query inspector, you see the results of a query with step=15s. It will return one sample every 15 seconds. If you replace step=15s in that query with step=1s you'll get back one sample per second. That doesn't mean the original data has 1 second resolution, just that Prometheus will helpfully return one sample per second so it can be graphed nicely. The value at any given step will be that of the last sample before the respective timestamp. If you use the Prometehus UI instead and query for counter[5m] you will get the actual underlying samples. – Coil
7361, 7370, 7370 is 3 samples, not 2. 2 samples would be either 7361, 7370 or 7370, 7370. The corresponding increases for the two cases would be 1 and respectively 0. Extrapolated to 2 minutes, 2 and 0. The original samples are (more or less) one minute apart, but not exactly on the minute. So if you pick an arbitrary 2 minute interval, it will most often contain exactly 2 samples (very rarely 1 or 3). – Coil
> It will return one sample every 15 seconds. If you replace step=15s in that query with step=1s you'll get back one sample per second. -- Yeah, that's what I am saying: that wasn't happening: if I try to reduce it below 15s, the samples I am getting back are still 15s apart - that's why I thought that was the actual true interval. – Unemployment
That's because your data source in Grafana is probably configured with a min step of 15s (which should instead be configured to be equal to the scrape interval of Prometheus, for exactly this reason). Use the Prometheus UI instead, it's a lot simpler and makes it more obvious what's going on. – Coil
Let us continue this discussion in chat. – Unemployment

© 2022 - 2024 β€” McMap. All rights reserved.