Java update from 11.0.11 to 11.0.13 changes JVM code cache behavior and results in more process cpu usage and strange profiled nmethods memory usage
Asked Answered
A

1

6

We run a complex clustered application with Java 11.0.11 and -XX:ReservedCodeCacheSize=375m with the same load profile attached 24/7 without any restarts.

After we updated AdoptOpenJDK jdk-11.0.11+9 to Temurin jdk-11.0.13+8, both using -XX:ReservedCodeCacheSize=375m with no code changes in our app, just updating Java, we noticed the following changes:

  • process CPU usage increased from ~ 30% to ~ 35% running on a AWS r5.8xlarge with 32 vCPUs, 256GB RAM and -Xms160G -Xmx160G, probably caused by high JVM Sweeper thread activity (see below)
  • JVM CodeHeap 'non-profiled nmethods' (C2 optimized code) was filled up (used=176 MB, max=184 MB) as with Java 11.0.11 which is expected BUT
  • JVM CodeHeap 'profiled nmethods' (C1 optimized code) was filled up (146 MB Used memory, 184 MB Maximum memory) shortly after process restart AND got cut down to ~ 40 MB after just 15 minutes and has stayed at this level since then (24 hours)

Java 11.0.13 + -XX:ReservedCodeCacheSize=375m, timezone=UTC+1 Java 11.0.13 + -XX:ReservedCodeCacheSize=375m

vs

Java 11.0.11 + -XX:ReservedCodeCacheSize=375m, timezone=UTC+1 Java 11.0.11 + -XX:ReservedCodeCacheSize=375m

The next step was to increase the -XX:ReservedCodeCacheSize to 512m but after ~ 3 hours the JVM CodeHeap 'profiled nmethods' (C1 optimized code) got cut down again to ~ 50 MB and even decreased further over time

Java 11.0.13 + -XX:ReservedCodeCacheSize=512m, timezone=UTC+1 Java 11.0.13 + -XX:ReservedCodeCacheSize=512m

Therefore we decided to double -XX:ReservedCodeCacheSize to 1024m and for the last 7 days both CodeHeap areas have been stable

  • CodeHeap 'non-profiled nmethods': 405 MB Used memory, 405 MB Committed memory, 508 MB Maximum memory
  • CodeHeap 'profiled nmethods': 258 MB Used memory, 258 MB Committed memory, 508 MB Maximum memory

Java 11.0.13 + -XX:ReservedCodeCacheSize=1024m is running stable for one week, timezone=UTC+1 Java 11.0.13 + -XX:ReservedCodeCacheSize=1024m

We tried these changes on different nodes and also on one node and the result is always reproducible.


With -XX:ReservedCodeCacheSize=375m the Linux perf tool shows high CPU usage for the Sweeper thread, which completely disappears with 1024m.

perf record -F 99 --output=perf-375m-uptime-21hours.data -ag -p <PID> sleep 60

+    4.82%     0.00%  Sweeper thread   libpthread-2.17.so                                                  [.] start_thread
+    4.82%     0.00%  Sweeper thread   libjvm.so                                                           [.] thread_native_entry
+    4.82%     0.00%  Sweeper thread   libjvm.so                                                           [.] Thread::call_run
+    4.82%     0.00%  Sweeper thread   libjvm.so                                                           [.] JavaThread::thread_main_inner
+    4.82%     0.00%  Sweeper thread   libjvm.so                                                           [.] NMethodSweeper::sweeper_loop
+    4.82%     0.00%  Sweeper thread   libjvm.so                                                           [.] NMethodSweeper::possibly_sweep
+    4.79%     0.03%  Sweeper thread   libjvm.so                                                           [.] NMethodSweeper::sweep_code_cache
+    4.24%     0.03%  Sweeper thread   libjvm.so                                                           [.] NMethodSweeper::process_compiled_method
+    4.05%     0.00%  C2 CompilerThre  libpthread-2.17.so                                                  [.] start_thread
+    4.05%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] thread_native_entry
+    4.05%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] Thread::call_run
+    4.05%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] JavaThread::thread_main_inner
+    4.05%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] CompileBroker::compiler_thread_loop
+    4.04%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] CompileBroker::invoke_compiler_on_method
+    4.03%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] C2Compiler::compile_method
+    4.03%     0.00%  C2 CompilerThre  libjvm.so                                                           [.] Compile::Compile
+    3.53%     0.68%  Sweeper thread   libjvm.so                                                           [.] CompiledMethod::cleanup_inline_caches_impl

As a result the CPU usage with Java 11.0.13 and -XX:ReservedCodeCacheSize=1024m is the same or slightly better than with Java 11.0.11 and -XX:ReservedCodeCacheSize=375m


During our investigation no helpful log lines were printed, e.g. we expected to see something like "CodeCache is full. Compiler has been disabled."

jcmd <PID> VM.log what=codecache=info,sweep*=trace decorators=time,uptime,level,tags output=codecache-sweep.log

All charts in the picture use timezone UTC+1. All timestamps in the log files are UTC+0.

The colors are just intended to show changes over time.

codecache-sweep-375m.log.0 Regex for interesting start time: \[2021-11-22T10:2[5-9].+'profiled nmethods' - A lot of sweeps per minute codecache-sweep-375m.log.0 picture

codecache-sweep-512m.log.0 Regex for interesting start time: \[2021-11-22T13:3[6-9].+'profiled nmethods - Still many sweeps per minute. codecache-sweep-512m.log.0 picture

codecache-sweep-1024m.log All fine, nothing special, avg 5 sweeps per day enter image description here


Please be aware that a lot of changes in the code cache / code heap components were made in Java 11.0.12 (release notes): Search for CodeHeap

Can you please help us to understand the reasoning and implications of CodeHeap 'profiled nmethods' shrinking down to a very low level and not increasing anymore?


UPDATE 2021-12-28

Re-running 11.0.11 and 11.0.13 with -XX:ReservedCodeCacheSize=375m and jcmd <pid> Compiler.CodeHeap_Analytics aggregate every 5 seconds to collect more data for the OpenJDK team.

Java 11.0.11 vs 11.0.13

zip with log and CodeHeap_Analytics data

The much higher code sweeper invocations (with 11.0.13) can still be reproduced.

Java 11.0.11 2h uptime, sweeper invocations:
grep -F '[codecache,sweep,start]' codecache-sweep-11.0.11_375m.log* | wc -l
14.458

Java 11.0.13 2h uptime, sweeper invocations:
grep -F '[codecache,sweep,start]' codecache-sweep-11.0.13_375m.log* | wc -l
41.901

For detailed statistics please open the jcmd_compiler_codeheap_analytics_aggregate files from the zip.

Text diff jcmd_compiler_codeheap_analytics_aggregate_2021-12-28-191001.txt vs jcmd_compiler_codeheap_analytics_aggregate_2021-12-28-191000.txt

Actable answered 23/11, 2021 at 19:8 Comment(2)
This sounds almost like a bug report. Consider reporting this to the OpenJDK project.Clower
I asked in the mailing list, let's see what they will reply: mail.openjdk.java.net/pipermail/hotspot-compiler-dev/…Actable
M
7

This is a short summary of my comments on the OpenJDK mailing list. My last comment there was on Jan 3rd, 2022.

  • There is no indication that the observed behaviour is caused by a bug.
  • All the logs and traces I saw show normal operation.
  • It is very likely that the CodeHeap size - before the JVM upgrade - was optimised to the max.
  • Some minor change in JIT code generation, e.g. slightly larger generated code size, has pushed CodeHeap management to the bad side of the tipping point.
  • The result is obvious: no steady state can be reached. There are permanent code sweeps and recompilations.
  • The CodeHeap segment for profiled methods is drained over time because all nmehtods become hot enough to be compiled at highest optimisation level.

The remedy? It sounds dumb and obvious at the same time: increase CodeCache space. Because a steady state is reached with a large enough CodeCache, it may be beneficial to disable segmented code cache.

Melioration answered 17/1, 2022 at 17:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.