Why there is performance degradation after ~6 hours of Java 9 G1 work without the actual increase in load?
Asked Answered
F

1

46

I switched 1 instance (2 vCPU, 2GB RAM, load ~4k req/sec) to Java 9 (from latest Java 8). For a while, everything was fine and CPU usage was same as before. However, after ~6 hours CPU consumption increased by 4% (from 21% to 25%) for no reason. I had no traffic spikes, no memory consumption increased, no metric changes (I have counters for every method within code). Nothing.

I left this instance untouched for ~12 hours expecting it will revert back. But nothing changed. It just started consuming more CPU.

top command showed that the instance had more CPU spikes than usually for the Java server process. I read recently that G1 is not suitable for the high throughput. So I made a conclusion that reason could be in G1.

I restarted instance with:

java -XX:+UseParallelGC -jar server-0.28.0.jar

And after ~20 hours of the monitoring, everything is fine as before. CPU consumption is on the level of 21% as it was many days before.

CPU usage right after Java 9 deployment (6h scale):

enter image description here

CPU increase after 7 hours + 12 hours "untouched" (7d scale):

enter image description here

CPU after -XX:+UseParallelGC (24h scale):

enter image description here

So my question is - is that expected behavior for the G1? Anyone else sees something similar?

Ubuntu 16.04 x64

java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

EDIT 03.01.2019

Tried to run one the same server with G1 on the java 10.0.2:

java version "10.0.2" 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)

G1 consumes 40% more CPU than UseParallelGC right after the server restart.

Fetial answered 5/10, 2017 at 9:44 Comment(10)
Did you happen to see the GC logs at that point in time(6h scale)? Would have been good to rule out the doubtful suspicion over the GC and provided confidence to know the exact cause.Wuhsien
No :(. This is the production instance so I didn't want to risk there.Fetial
yeah... without logs this is mission impossible (besides guessing I assume)Unthinking
I didn't tune anything. All deployments are with defaults. Just java -jar server.javaFetial
This is the production instance so I didn't want to risk there - there generally is little risk with GC logging in production in my experience as long as you don't turn on the costly tracing level stuff. Small caveats: logs of course take disk space and cause IO, which can be an issue if you have high-latency IO devices, but those issues are fairly predictableEvangelize
I feel like this question is unanswerable. The JVM runtime is not deterministic. Any number of things could be happening - some of which may not even be related to the GC directly, even if it looks like it's related based on your startup options.Pride
Unfortunately it is difficult to know the root cause unless GC logs are examined. G1 is only helpful if you are running large heaps. Java 10 includes improvements in G1 collector, Full GCs will be on multiple threads in parallel. This should help in reducing pause times in most cases.Nervous
Just a thought: JIT compiling/optimization can sometime tune/change (sometimes for the better/worse) with time (operations). I've seen the optimizer get "poisoned" by certain workloads and lose 10% efficiency. No big deal unless you are watching or tuning. As far as I know - you can't (nor shouldn't be able to) detect what "optimisations" it has done.Clearway
@Clearway Many ways to analyze JIT optimizations & deoptimizations. PrintCompilation VM argument, using JITWatch.Jamin
Could this be a virus scanner? Do you have anything else running on this machine?Stan
E
2

(Note that GC tuning is extremely dependant on the environment, so there is no magic recipe.)

Had a very similar issue with G1. By default, it seems to be rather badly suited for REST endpoints (again, this is only what I experienced in my direct neighborhood). What helped me was experimenting with the GC flags, as described here.

For us, the biggest improvements came from -XX:G1NewSizePercent=25 and -XX:MaxGCPauseMillis=50. G1 is also auto-tuning itself over time, so the max. GC pause limit has a significant effect on all other parameters.

Epicotyl answered 9/9, 2018 at 0:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.