How to monitor Apache Spark with Prometheus?

Asked 26/3, 2018 at 10:11 Answered 17/6, 2024 at 8:21

I have read that Spark does not have Prometheus as one of the pre-packaged sinks. So I found this post on how to monitor Apache Spark with prometheus.

But I found it difficult to understand and to success because I am beginner and this is a first time to work with Apache Spark.

First thing that I do not get is what I need to do?

I need to change the metrics.properties
Should I add some code in the app or?

I do not get what are the steps to make it...

The thing that I am making is: changing the properties like in the link, write this command:

--conf spark.metrics.conf=<path_to_the_file>/metrics.properties

And what else I need to do to see metrics from Apache spark?

Also I found this links: Monitoring Apache Spark with Prometheus

https://argus-sec.com/monitoring-spark-prometheus/

But I could not make it with it too...

I have read that there is a way to get metrics from Graphite and then to export them to Prometheus but I could not found some useful doc.

Mesh answered 26/3, 2018 at 10:11 Comment(1)

this blog has a good and detail explanation. argus-sec.com/monitoring-spark-prometheus – Katusha 18/6, 2020 at 9:9

There are few ways to monitoring Apache Spark with Prometheus.

One of the way is by JmxSink + jmx-exporter

Preparations

Uncomment *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink in spark/conf/metrics.properties
Download jmx-exporter by following link on prometheus/jmx_exporter
Download Example prometheus config file

Use it in spark-shell or spark-submit

In the following command, the jmx_prometheus_javaagent-0.3.1.jar file and the spark.yml are downloaded in previous steps. It might need be changed accordingly.

bin/spark-shell --conf "spark.driver.extraJavaOptions=-javaagent:jmx_prometheus_javaagent-0.3.1.jar=8080:spark.yml"

Access it

After running, we can access with localhost:8080/metrics

It can then configure prometheus to scrape the metrics from jmx-exporter.

NOTE: We have to handle to discovery part properly if it's running in a cluster environment.

Sommer answered 3/1, 2019 at 10:23 Comment(4)

,what needs to be done for spark cluster, can you provide steps for the same. – Hubbub 2/4, 2020 at 12:48

in order to scrap the metrics from jmx-exporter you have to add on the file /etc/prometheus/prometheus.yml the lines - job_name: "spark_streaming_app" scrape_interval: "5s" static_configs: - targets: ['localhost:8082'] – Katusha 18/6, 2020 at 9:7

How do you scrape a custom metric from your Spark app? – Dwelt 10/6, 2021 at 7:24

I'm getting Error opening zip file or JAR manifest missing while running it with mentioned command – Sap 21/3, 2023 at 10:1

PrometheusServlet

Things have since changed and the latest Spark 3.2 comes with Prometheus support built-in using PrometheusServlet:

The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties.

PrometheusServlet: (Experimental) Adds a servlet within the existing Spark UI to serve metrics data in Prometheus format.

spark.ui.prometheus.enabled

There is also spark.ui.prometheus.enabled configuration property:

Executor metric values and their measured memory peak values per executor are exposed via the REST API in JSON format and in Prometheus format.

The Prometheus endpoint is conditional to a configuration parameter: spark.ui.prometheus.enabled=true (the default is false).

Demo

spark.ui.prometheus.enabled

Start a Spark application with spark.ui.prometheus.enabled=true, e.g.

spark-shell \
  --master spark://localhost:7077 \
  --conf spark.ui.prometheus.enabled=true

Open http://localhost:4040/metrics/executors/prometheus and you should see the following page:

spark_info{version="3.2.0", revision="5d45a415f3a29898d92380380cfd82bfc7f579ea"} 1.0
metrics_executor_rddBlocks{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_memoryUsed_bytes{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_diskUsed_bytes{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_totalCores{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_maxTasks{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_activeTasks{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_failedTasks_total{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_completedTasks_total{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0

PrometheusServlet

Use (uncomment) the following conf/metrics.properties:

*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus

Start a Spark application (e.g. spark-shell) and go to http://localhost:4040/metrics/prometheus. You should see the following page:

metrics_app_20211107173310_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_memUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_memUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_offHeapMemUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_offHeapMemUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_onHeapMemUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_onHeapMemUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOffHeapMem_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOffHeapMem_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOnHeapMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOnHeapMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_DAGScheduler_job_activeJobs_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_job_activeJobs_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_job_allJobs_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_job_allJobs_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_stage_failedStages_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_stage_failedStages_Value{type="gauges"} 0

Sinnard answered 7/11, 2021 at 16:45 Comment(2)

This requires a service monitor deployed in prometheus right ? – Zumwalt 24/5, 2022 at 2:19

I am also interested in this approach because this idea exposes the metrics in the driver but it does not tell Prometheus where to scrape the metrics. – Ir 9/3, 2023 at 17:2

I have followed the GitHub readme and it worked for me (the original blog assumes that you use the Banzai Cloud fork as they were expected the PR to accepted upstream). They externalized the sink to a standalone project (https://github.com/banzaicloud/spark-metrics) and I used that to make it work with Spark 2.3.

Actually you can scrape (Prometheus) through JMX, and in that case you don't need the sink - the Banzai Cloud folks did a post about how they use JMX for Kafka, but actually you can do this for any JVM.

So basically you have two options:

use the sink
or go through JMX,

they open sourced both options.

Sudarium answered 28/3, 2018 at 18:1 Comment(2)

Can you write what have you done step by step? Btw thank you for the good explanation! – Mesh 29/3, 2018 at 6:53

Can you make that @Sudarium ? – Mesh 31/3, 2018 at 14:52

Depending on the cloud environment, some proposed solutions might not work as the Spark driver might be proxied for example. I published a project that implements a pull- and push-approach for metrics ingestion with a Prometheus compatible DB, it could easily be changed to Prometheus: https://github.com/xonai-computing/xonai-dashboard

Configuration settings are mentioned in its setup docs

Assignat answered 17/6, 2024 at 8:21 Comment(0)

Preparations

Use it in spark-shell or spark-submit

Access it

Next

PrometheusServlet

spark.ui.prometheus.enabled

Demo

spark.ui.prometheus.enabled

PrometheusServlet

Recommended topics

Hot tags