Alert Management Threshold with Monitoring for Kafka/Confluent JMX metrics
Asked Answered
J

0

0

I am building an Alert Monitoring tool for Kafka.

I do understand that there can be metrics for which the thresholds depends on application data. But I am only interested in knowing those metrics and threshold values which will help me in knowing the lag and help in determining if any scaling is required.

As of now I can do following :

  • Enable JMX on Kafka Broker
  • Fecth JMX metrics using JMX Java client or jCOnsole.

Next I researched and found so many metrics but none had comnplete thresholds (eg some value or pattern like increasing or decreasing or may be some maths ) over which I should write my logic for metrics .

Few Example are following :

UnderReplicatedPartitions - Alert if value is greater than 0.
records-lag-max - alert if value increases with time .
OfflinePartitionsCount - alert if value is greater then zero
ActiveControllerCount - alert if value other than 1 .

Judicatory answered 24/9, 2018 at 12:3 Comment(3)
You realize Confluent Control Center can monitor and alert all of these already, yes?Expose
@cricket_007 But Control center is part of licensed version .I have a requirement of building a custom solution in which monitoring Kafka is just one aspect .Judicatory
Alright, well, I'm sure you could built some alerting platform with Grafana or Prometheus Alertmanager. Maybe something with Kapacitor&Chronograf&InfluxDBExpose

© 2022 - 2024 — McMap. All rights reserved.