Sending Spark streaming metrics to open tsdb
Asked Answered
P

1

9

How can I send metrics from my spark streaming job to open tsdb database? I am trying to use open tsdb as data source in Grafana. Can you please help me with some references where I can start.

I do see open tsdb reporter here which does similar job. How can I integrate the metrics from Spark streaming job to use this? Is there any easy options to do it.

Positronium answered 5/12, 2017 at 0:30 Comment(0)
M
4

One way to send the metrics to opentsdb is to use it's REST API. To use it, simply convert the metrics to JSON strings and then utilize the Apache Http Client library to send the data (it's in java and can therefore be used in scala). Example code can be found on github.


A more elegant solution would be to use the Spark metrics library and add a sink to the database. There has been a discussion on adding an OpenTSDB sink for the Spark metrics library, however, finally it was not added into Spark itself. The code is avaiable on github and should be possible to use. Unfortunalty the code is compatible on Spark 1.4.1, however, in worst case it should still be possible to get some indications of what is necessary to add.

Martie answered 13/12, 2017 at 1:46 Comment(6)
Thank you. I found this before. But could not use the same. I created a different Scala project with this opentsdbSink class only. I tried to create a jar from it and including it in the current Java spark streaming job. I keep getting errors like "Sink class org.apache.spark.metrics.sink.OpenTsdbSink cannot be instantiated. java.lang.NoSuchMethodException: org.apache.spark.metrics.sink.OpenTsdbSink.<init>(java.util.Properties, com.codahale.metrics.MetricRegistry, org.apache.spark.SecurityManager)". What would be your advice for me to start with?Positronium
Is there any quick easy way of getting these metrics to open tsdb instead of using this metrics library? Probably something around HTTP posts?Positronium
Also I don't find the package of org.apache.spark.metrics with various sinks in Spark 2.2 which I am using. Is it understood by default that these older packages are available with 2.2 unless mentioned? I could not find it in Spark 2.2 library documentation.Positronium
If I include this package inside same project, I get error like "ERROR org.apache.spark.metrics.MetricsSystem: Sink class org.apache.spark.metrics.sink.OpenTsdbSink cannot be instantiated"Positronium
Is there a way I can use CSVSink or MetricServlet sink to create CSV or JSON which could be moved to opentsdb? Is it very slow?Positronium
@Passionate: Are you setting up the sink correctly? Here is a how-to for Kafka but it should work for the opentsdb snik too: github.com/erikerlandson/spark-kafka-sink. You can't initialize it in the code yourself.Martie

© 2022 - 2024 — McMap. All rights reserved.