GCP Alert Filters Don't Affect Open Incidents
A

1

7

I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).

The way I defined the alert is:

alert definition

And the secondary aggregator is delta.

The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).

ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.

Is there something that I'm missing?

Edit:

Adding another image to show what I think might be the problem: incident

From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work

Aerugo answered 21/6, 2021 at 8:0 Comment(2)
I dont its the case since its stays open for a few days until I close them manuallyAerugo
Could false-positive alerts due to have a delay between when the logs entries are generated and when the logging receives them, may affect you?Carouse
B
1

According to the official documentation:

"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."

That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:

"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."

The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).

So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.

Bustos answered 21/7, 2021 at 15:39 Comment(5)
why would a custom metric be different as to closing automatically?Aerugo
With log-based metrics you can generate a distribution or a counter metric that does increase and decrease with the amount of status requests you receive, as it is custom it will calculate the value as you determine it and it won't use the last value received.Bustos
It feels like those built in metrics from google would be much more useful if they would act like custom metrics and default to some zero's value if there is not data... It only makes sense to set the value to zero if there are no http requests in the last X minutesAerugo
I don't know, I guess that depends on what are you trying to have an alert for. If an HTTP request gives an error everytime is reached the fact that it had 0 interaction in the last X minutes does not mean that the issue is solved. What would be great is to have and option to choose how Monitoring reacts to missing measurements. Opening a Feature Request for that would be great, imho.Bustos
opened an issue - issuetracker.google.com/issues/194427617Aerugo

© 2022 - 2024 — McMap. All rights reserved.