Monitor usage of threadpool from a reactor scheduler with micrometer
Asked Answered
V

1

6

Problem

I want to monitor the usage of a thread pool from a specific Scheduler (BoundedElasticScheduler). I want to see if the thread pool capacity fits or if it gets to its limit quite often and if there are a lot of waiting tasks.

Question

I think the MAX usage of a threadpool is one of the most important metrics. Is there a metric that I haven't found yet that could be used for it? Or does someone have a hint for me how to observe thread usage in the pool and implement the metric myself?

Tried so far

  1. Using reactors build-in metrics

In reactor 3.4.x I found the metric executor.active, but it is a gauge and in monitoring tools, this is polled in an interval (e.g. every minute), this is too inaccurate for short tasks that only last some milliseconds in the pool. In reactor 3.5 I found a max execution time, but not a max for the active threads amount. The documentations are heavily updated currently because of the 3.5 release, so maybe I miss a metric that could be used for what I need.

  1. Using a custom implementation to track usage

I've also tried to implement a DistributedSummary around the scheduler, so I'm able to track the MAX scheduled tasks per time interval (since DistributedSummary uses a TimeWindowMax which will show the MAX per monitoring interval). But it will only track the scheduling itself, not the real thread usage, for example if you have a Mono which evaluates some Monos and Flux inside, which will also use threads from the pool. So it doesn't show me the workload of the pool.

Valise answered 14/11, 2022 at 11:3 Comment(0)
B
3

Reactor provides multiple metrics that allow to monitor schedulers:

  • executor_active_threads, gauge, The approximate number of threads that are actively executing tasks

  • executor_pool_core_threads, gauge, The core number of threads for the pool

  • executor_pool_max_threads, gauge, The maximum allowed number of threads in the pool

  • executor_pool_size_threads, gauge, The current number of threads in the pool

  • executor_completed_tasks_total, counter, The approximate total number of tasks that have completed execution

  • executor_completed_tasks_total, counter, The approximate total number of tasks that have completed execution

  • executor_queued_tasks, gauge, The approximate number of tasks that are queued for execution

  • executor_queue_remaining_tasks, gauge, The number of additional elements that this queue can ideally accept without blocking

  • executor_scheduled_once_total, counter

  • executor_scheduled_repetitively_total, counter

  • executor, timer

    • executor_seconds_sum, counter

    • executor_seconds_count, counter

    • executor_seconds_max, gauge

  • executor.idle, timer

    • executor_idle_seconds_sum, counter

    • executor_idle_seconds_count, counter

    • executor_idle_seconds_max , gauge

Internally reactor uses ExecutorServiceMetrics to instrument Schedulers and add additional tags like reactor_scheduler_id.

To monitor number of threads in the reactor schedulers

sum(executor_pool_size_threads) by (reactor_scheduler_id)

or to monitor max number of threads

sum(executor_pool_max_threads) by (reactor_scheduler_id)

There is a demo project that could be used to play with reactor metrics and has Grafana dashboards: https://github.com/reactor/reactor-monitoring-demo

Bark answered 18/11, 2022 at 1:35 Comment(11)
Did you copy those metrics from the documentation? Because the documentation doesn't seem to be correct. With reactor 3.4.x when I check the spring boot actuator, I only get 7. I wanted to have a look into executor_seconds_max but it does not exist. The list of metrics also differ from site to site where you can read about it, like the reacotr docs or the micrometer docs. executor_pool_size_threads is a gauge, so you would have to peek the value just at the right time when there is a max. executor_pool_max_threads is showing the limit of the pool, not the max number of active threads.Valise
If you are using prometheus, metrics could be discovered using corresponding endpoint from the service itself. In case of Spring Actuator /actuatur/prometheus.Bark
I know, there are still not all of your mentioned metrics, on the other hand, there exists other than you mentioned, so I guess you just copied it from the documentation (which seems not up-to-date)Valise
These are all metrics related to Schedulers. Internally reactor uses ExecutorServiceMetrics to instrument Schedulers and add additional tags like reactor_scheduler_id.Bark
Yea, those are pretty much the same metrics that I see in the actuator and the monitoring. As you can see, there is no executor_seconds_max. I also found this in several older docs and questions in SO, I guess this has been removed a while ago. And executor_active_threads is a Gauge so there is no chance to get a max when the values are requested in intervals.Valise
if you look more carefully executor_seconds_maxgauge is added by executor timer together with executor_seconds_count & executor_seconds_sum counters github.com/micrometer-metrics/micrometer/blob/… To get different metrics for executor threads you would need to wrap executor service somehow that could be tricky to do. Check this thread for details github.com/spring-projects/spring-boot/issues/….Bark
I still don't find executor_seconds_max, there is no mention in your link. My actuator shows me the following metrics: [...,"executor.active","executor.completed","executor.pool.core","executor.pool.max","executor.pool.size","executor.queue.remaining","executor.queued",...] (... are non-executor metrics, those are really all executor metrics that it shows me) so I'm still wondering where this seconds_max is located. I don't find it on their github.Valise
this is implicit for micrometer timers. I'm not able to troubleshoot your configuration. try to use demo project I mentioned, check available metrics and see what is different in your project.Bark
I've tested the demo project. It also does not offer a metric called "executor seconds max" or anything close to that name. If you check actuator/metrics. there are: ["executor","executor.active","executor.completed","executor.idle","executor.pool.core","executor.pool.max","executor.pool.size","executor.queue.remaining","executor.queued","executor.scheduled.once","executor.scheduled.repetitively",...]. I'm not sure what secondsMax means so maybe it is useless for my case, however, it also does not exist. I don't know why it is mentioned in some documentations.Valise
To be more specific, just download the mentioned demo project, run it, and check: http://localhost:8080/actuator/metricsValise
Oh, but in grafana/prometheus, there is such a metric. I guess it is just added by them and has nothing to do with micrometer? Unfortunatly, I'm not using prom/grafana. And I think secondsMax is also not the right metric, I just wondered why it is not in actuator/metrics. There seems to be no useful metric for getting the max-value, since everything there is is a gauge. Gauges are requested on interval, so they are way to inaccurate to be useful.Valise

© 2022 - 2024 — McMap. All rights reserved.