I am facing a continues increase in GC pauses with G1GC algorithm. The service latencies continue to grow over time. Once this happens, I restart my service and the latencies go back to normal. Post startup, the latencies again continue to increase over time.
At the time of startup, the service latencies are around ~200ms, but within 24 hours, they go up to 350ms, and continue to increase in a linear fashion.
The increase in service latencies match the increase in GarbageCollection metrics.
Service specifications
I am running a java application (JDK-8) on M4-2X Large EC2 boxes with 50 active threads per box. Service runs at an 12GB heap. The average latency of a request is about 250ms, and the rate of incoming requests is about ~20 per second per box.
G1G1 configurations
<jvmarg line="-Xms12288M"/>
<jvmarg line="-Xmx12288M"/>
<jvmarg line="-verbose:gc" />
<jvmarg line="-XX:+UseG1GC"/>
<jvmarg line="-XX:+PrintGCDetails" />
<jvmarg line="-XX:+PrintGCTimeStamps" />
<jvmarg line="-XX:+PrintTenuringDistribution" />
<jvmarg line="-XX:+PrintGCApplicationStoppedTime" />
<jvmarg line="-XX:MaxGCPauseMillis=250"/>
<jvmarg line="-XX:ParallelGCThreads=20" />
<jvmarg line="-XX:ConcGCThreads=5" />
<jvmarg line="-XX:-UseGCLogFileRotation"/>
GC logs
79488.355: Total time for which application threads were stopped: 0.0005309 seconds, Stopping threads took: 0.0000593 seconds
79494.559: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 369098752 bytes, new threshold 15 (max 15)
- age 1: 64725432 bytes, 64725432 total
- age 2: 8867888 bytes, 73593320 total
- age 3: 2503592 bytes, 76096912 total
- age 4: 134344 bytes, 76231256 total
- age 5: 3729424 bytes, 79960680 total
- age 6: 212000 bytes, 80172680 total
- age 7: 172568 bytes, 80345248 total
- age 8: 175312 bytes, 80520560 total
- age 9: 282480 bytes, 80803040 total
- age 10: 160952 bytes, 80963992 total
- age 11: 140856 bytes, 81104848 total
- age 12: 153384 bytes, 81258232 total
- age 13: 123648 bytes, 81381880 total
- age 14: 76360 bytes, 81458240 total
- age 15: 63888 bytes, 81522128 total
, 2.5241014 secs]
[Parallel Time: 2482.2 ms, GC Workers: 20]
[GC Worker Start (ms): Min: 79494558.9, Avg: 79494567.4, Max: 79494602.1, Diff: 43.2]
[Ext Root Scanning (ms): Min: 0.0, Avg: 140.9, Max: 2478.3, Diff: 2478.3, Sum: 2818.8]
[Update RS (ms): Min: 0.0, Avg: 5.3, Max: 41.9, Diff: 41.9, Sum: 106.9]
[Processed Buffers: Min: 0, Avg: 23.2, Max: 80, Diff: 80, Sum: 465]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 4.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Object Copy (ms): Min: 0.0, Avg: 41.9, Max: 68.7, Diff: 68.7, Sum: 837.9]
[Termination (ms): Min: 0.0, Avg: 2282.3, Max: 2415.8, Diff: 2415.8, Sum: 45645.3]
[Termination Attempts: Min: 1, Avg: 21.5, Max: 68, Diff: 67, Sum: 430]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.0]
[GC Worker Total (ms): Min: 2435.8, Avg: 2470.7, Max: 2482.0, Diff: 46.2, Sum: 49414.5]
[GC Worker End (ms): Min: 79497037.9, Avg: 79497038.1, Max: 79497041.0, Diff: 3.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.9 ms]
[Other: 40.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 37.7 ms]
[Ref Enq: 0.8 ms]
[Redirty Cards: 0.4 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.1 ms]
[Free CSet: 1.3 ms]
[Eden: 5512.0M(5512.0M)->0.0B(4444.0M) Survivors: 112.0M->128.0M Heap: 8222.2M(12.0G)->2707.5M(12.0G)]
[Times: user=19.63 sys=0.18, real=2.53 secs]
79497.083: Total time for which application threads were stopped: 2.5252654 seconds, Stopping threads took: 0.0000914 seconds
I am looking for some help with GC configurations. On the basis of my reading, I am planning to increase the number of parallel threads to 32, set G1HeapRegionSize to 16M, and set ConcGCThreads = 8.
Mixed Concurrent Mark Remark Cleanup initial-mark Young GC Total
Count 14 4 4 4 4 263 293
Total GC Time 4 sec 120 ms 0 1 sec 100 ms 70 ms 980 ms 1 min 8 sec 10 ms 1 min 14 sec 280 ms
Avg GC Time 294 ms 0 275 ms 17 ms 245 ms 259 ms 254 ms
Avg Time std dev 127 ms 0 73 ms 4 ms 73 ms 63 ms 79 ms
Min/Max Time 0 / 560 ms 0 / 0 0 / 400 ms 0 / 20 ms 0 / 340 ms 0 / 620 ms 0 / 620 ms
Avg Interval Time 2 min 55 sec 119 ms 12 min 32 sec 443 ms 12 min 32 sec 443 ms 12 min 32 sec 449 ms 12 min 32 sec 423 ms 13 sec 686 ms 51 sec 887 ms
GC Causes
Cause Count Avg Time Max Time Total Time Time %
G1 Evacuation Pause 263 259 ms 560 ms 1 min 8 sec 50 ms 91.61%
GCLocker Initiated GC 15 272 ms 400 ms 4 sec 80 ms 5.49%
Others 12 n/a n/a 1 sec 250 ms 1.68%
G1 Humongous Allocation 3 300 ms 340 ms 900 ms 1.21%
Total 293 n/a n/a 1 min 14 sec 280 ms 99.99%
Tenuring summary
Desired Survivor Size: 448.0 mb,
Max Threshold: 15
Age Survival Count Average size (kb) Average Total 'To' size (kb)
age 1 281 54856.84 54856.84
age 2 273 32935.6 89227.65
age 3 258 29812.41 122175.68
age 4 235 28499.48 158266.46
age 5 214 27909.13 196528.23
age 6 192 26896.33 237892.45
age 7 180 25759.58 272516.81
age 8 174 23565.21 299092.37
age 9 166 21745.62 320927.73
age 10 149 19323.6 340228.24
age 11 125 17400.14 357569.6
age 12 96 13995.26 372030.12
age 13 55 10909.19 378053.14
age 14 38 10197.95 389146.13
age 15 22 5996.65 395657.37