I have just upgraded our Spring Boot applications to Java 21. As a part of that, I have also done changes to use virtual threads. Both when serving API requests and when doing async operations internally using executors.
For one use-case, it seems like an Executor powered by virtual threads is performing worse than a ForkJoinPool
powered by OS threads. This use-case is setting some MDC values and calling an external system through HTTP.
This is my pseudo-ish-code:
List<...> ... = executorService.submit(
() -> IntStream.rangeClosed(-from, to)
.mapToObj(i -> ...)
.parallel()
.map(... -> {
try {
service.setSomeThreadLocalString(...);
MDC.put(..., ...);
MDC.put(..., ...);
return service.call(...);
} finally {
service.removeSomeThreadLocalString(...);
MDC.remove(...);
MDC.remove(...);
}
})
.toList())
.get();
Where ExecutorService is either:
new ForkJoinPool(30)
Executors.newVirtualThreadPerTaskExecutor()
It looks like option 1 is performing a lot better than 2. Sometimes it is 100% faster than option 2. I have done this test in a Java 21 environment. I am testing with 10 parallel executions. Where option 1 takes 800-1000ms normally, option 2 takes 1500-2000 ms.
If it makes any difference, have this property enabled in Spring Boot:
spring:
threads:
virtual:
enabled: true
Any ideas why this is happening?
ForkJoinPool
without specifying aparallelism
parameter? Can you try setting the virtual thread pool size to match the parallelism of theForkJoinPool
? Also extending on my previous comment: there's essentially no point in using virtual threads in code that doesn't block. – BluejacketForkJoinPool
without parallelism (new ForkJoinPool()
) and the performance was way worse - takes about 3 seconds with it. Which virtual thread pool size do you mean to match with the parallelism ofForkJoinPool
? Yes but the http call to the external service does block. – Ablationnew ForkJoinPool(30)
(parallelism = 30), with a parallelism of 1 for virtual threads. It is probably worthwhile to pass-Djdk.virtualThreadScheduler.parallelism=10
and/or-XX:ActiveProcessorCount=10
(or at least, something bigger than 1 for both or either) to yourjava
command line – OzonideActiveProcessorCount
may also have other benefits for your application, BTW. – OzonideActiveProcessorCount
option. – Ablation-XX:ActiveProcessorCount=10
seemed to get more unstable results than setting both that one and-Djdk.virtualThreadScheduler.parallelism=10
. Thank you. If you write an answer to this question I will mark it as the accepted one. Not sure why I am getting downvoted, I think this issue could help others as well in their migration to virtual threads. – Ablation