How to export data from Kafka to Prometheus?
Asked Answered
K

2

6

I am getting 300K+ metrics/Minute in a kafka topic as timeseries. I want to store and query the data. The visualisation tool which satisfy my requirement is Grafana. In order to efficiently store and query, I am thinking of storing these timeseries in Prometheus.

Kafka topic with lot of timeseries -> Prometheus -> Grafana

I am not so sure, how can I achieve this, as Prometheus is Pull based scraping method. Even if I write a pull service, will it allow me to pull 300K/Minute metrics?

SYS 1, UNIX TIMESTAMP, CPU%, 10
SYS 1, Processor, UNIX TIMESTAMP, CPUCACHE, 10
SYS 2, UNIX TIMESTAMP, CPU%, 30
.....

Most of the articles talks about Kafka exporter/JMX exporter to monitor Kafka. I am not looking for kafka monitoring, rather ship the timeseries data stored in a topic and leverage Prometheus query language and Grafana to analyze.

Kyte answered 15/5, 2020 at 17:26 Comment(0)
K
3

I came across "Kafka Connect Prometheus Metrics Sink connector" which exports data from multiple Apache Kafka® topics and makes the data available to an endpoint which is scraped by a Prometheus server. It is a commercial offering in confluent platform.

https://docs.confluent.io/kafka-connect-prometheus-metrics/current/index.html#prometheus-metrics-sink-connector-for-cp

I am sticking with my existing timeseries database. In order to work with Grafana, writing a custom datasource instead. Implementing PROMQL could be other alternative.

Update:

Learned about OpenTelemetry. One can use Opentelemetry standard to convert metrics to OTLP format and let the Opentelemetry collector read it from Kafka. OpenTelemetry collector has a prometheus remote write exporter.

Kyte answered 9/2, 2021 at 16:40 Comment(0)
C
1

I strongly advise against this approach, Prometheus exporters are mostly used for a metrics based analysis and monitoring: An example would be, you want to check how many messages went by a topic/partition every 10s.

It's possible to do what you are describing but it could cause a serious stress on your Prometheus cluster and storage, if you scrape that amount of data constantly depending on you cluster specs.

If you really want to store and query time-series based events I would suggest to log them to Elasticsearch. You can connect Grafana to Elasticsearch and use it as a data source for your querying.

Another option could be searching community based Kafka dashboards on Grafana's website which are populated by a Prometheus data source and see if any of them have the metrics that you need. This way you can figure out which exporter works for you: https://grafana.com/grafana/dashboards?dataSource=prometheus&direction=desc&orderBy=reviewsCount&search=kafka

Chaperone answered 17/5, 2020 at 12:57 Comment(2)
Are you suggesting that Prometheus will not be able to handle 300K metrics stored in my topic but ElasticSearch will do?Kyte
Elasticsearch, InfluxDB, TimescaleDB, etc would be better, yes. Note: The linked Kafka dashboards are primarily for JMX exporter (or Burrow) integration, not just random Prometheus-compatible records that have been pushed into KafkaAmbassadoratlarge

© 2022 - 2024 — McMap. All rights reserved.