I'm having issues with Prometheus alerting rules. I have various cAdvisor specific alerts set up, for example:
- alert: ContainerCpuUsage
expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
title: 'Container CPU usage (instance {{ $labels.instance }})'
description: 'Container CPU usage is above 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
When the condition is met, I can see the alert in the "Alerts" tab in Prometheus, however some labels are missing thus not allowing alertmanager to send a notification via Slack. To be specific, I attach custom "env" label to each target:
{
"targets": [
"localhost:8080",
],
"labels": {
"job": "cadvisor",
"env": "production",
"__metrics_path__": "/metrics"
}
}
But when the alert based on cadvisor metrics is firing, the labels are: alertname, instance and severity - no job label, no env label. All the other alerts from other exporters (f.e. node-exporter) work just fine and the label is present.
(sum(rate(container_cpu_usage_seconds_total{name!=""}[3m])) BY (instance, name,env) * 100) > 80
and it looks like it's working fine. Is this query okay? To be honest, I do not fully understand this: "But this way you'll get CPU utilisation per instance per name per environment." - why is that an issue? – Fluoroscopy