Latencies issues which G1GC

Asked 30/9, 2017 at 23:25 Answered 28/2, 2018 at 11:57

I am facing a continues increase in GC pauses with G1GC algorithm. The service latencies continue to grow over time. Once this happens, I restart my service and the latencies go back to normal. Post startup, the latencies again continue to increase over time.

At the time of startup, the service latencies are around ~200ms, but within 24 hours, they go up to 350ms, and continue to increase in a linear fashion.

The increase in service latencies match the increase in GarbageCollection metrics.

Service specifications

I am running a java application (JDK-8) on M4-2X Large EC2 boxes with 50 active threads per box. Service runs at an 12GB heap. The average latency of a request is about 250ms, and the rate of incoming requests is about ~20 per second per box.

G1G1 configurations

        <jvmarg line="-Xms12288M"/>
        <jvmarg line="-Xmx12288M"/>

        <jvmarg line="-verbose:gc" />
        <jvmarg line="-XX:+UseG1GC"/>
        <jvmarg line="-XX:+PrintGCDetails" />
        <jvmarg line="-XX:+PrintGCTimeStamps" />
        <jvmarg line="-XX:+PrintTenuringDistribution" />
        <jvmarg line="-XX:+PrintGCApplicationStoppedTime" />
        <jvmarg line="-XX:MaxGCPauseMillis=250"/>
        <jvmarg line="-XX:ParallelGCThreads=20" />
        <jvmarg line="-XX:ConcGCThreads=5" />
        <jvmarg line="-XX:-UseGCLogFileRotation"/>

GC logs

79488.355: Total time for which application threads were stopped: 0.0005309 seconds, Stopping threads took: 0.0000593 seconds
79494.559: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 369098752 bytes, new threshold 15 (max 15)
- age   1:   64725432 bytes,   64725432 total
- age   2:    8867888 bytes,   73593320 total
- age   3:    2503592 bytes,   76096912 total
- age   4:     134344 bytes,   76231256 total
- age   5:    3729424 bytes,   79960680 total
- age   6:     212000 bytes,   80172680 total
- age   7:     172568 bytes,   80345248 total
- age   8:     175312 bytes,   80520560 total
- age   9:     282480 bytes,   80803040 total
- age  10:     160952 bytes,   80963992 total
- age  11:     140856 bytes,   81104848 total
- age  12:     153384 bytes,   81258232 total
- age  13:     123648 bytes,   81381880 total
- age  14:      76360 bytes,   81458240 total
- age  15:      63888 bytes,   81522128 total
, 2.5241014 secs]
   [Parallel Time: 2482.2 ms, GC Workers: 20]
      [GC Worker Start (ms): Min: 79494558.9, Avg: 79494567.4, Max: 79494602.1, Diff: 43.2]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 140.9, Max: 2478.3, Diff: 2478.3, Sum: 2818.8]
      [Update RS (ms): Min: 0.0, Avg: 5.3, Max: 41.9, Diff: 41.9, Sum: 106.9]
         [Processed Buffers: Min: 0, Avg: 23.2, Max: 80, Diff: 80, Sum: 465]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 4.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
      [Object Copy (ms): Min: 0.0, Avg: 41.9, Max: 68.7, Diff: 68.7, Sum: 837.9]
      [Termination (ms): Min: 0.0, Avg: 2282.3, Max: 2415.8, Diff: 2415.8, Sum: 45645.3]
         [Termination Attempts: Min: 1, Avg: 21.5, Max: 68, Diff: 67, Sum: 430]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.1, Sum: 1.0]
      [GC Worker Total (ms): Min: 2435.8, Avg: 2470.7, Max: 2482.0, Diff: 46.2, Sum: 49414.5]
      [GC Worker End (ms): Min: 79497037.9, Avg: 79497038.1, Max: 79497041.0, Diff: 3.1]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.9 ms]
   [Other: 40.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 37.7 ms]
      [Ref Enq: 0.8 ms]
      [Redirty Cards: 0.4 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 1.3 ms]
   [Eden: 5512.0M(5512.0M)->0.0B(4444.0M) Survivors: 112.0M->128.0M Heap: 8222.2M(12.0G)->2707.5M(12.0G)]
 [Times: user=19.63 sys=0.18, real=2.53 secs]
79497.083: Total time for which application threads were stopped: 2.5252654 seconds, Stopping threads took: 0.0000914 seconds

I am looking for some help with GC configurations. On the basis of my reading, I am planning to increase the number of parallel threads to 32, set G1HeapRegionSize to 16M, and set ConcGCThreads = 8.

    Mixed   Concurrent Mark Remark  Cleanup initial-mark    Young GC    Total
Count   14  4   4   4   4   263 293
Total GC Time   4 sec 120 ms    0   1 sec 100 ms    70 ms   980 ms  1 min 8 sec 10 ms   1 min 14 sec 280 ms
Avg GC Time     294 ms  0   275 ms  17 ms   245 ms  259 ms  254 ms
Avg Time std dev    127 ms  0   73 ms   4 ms    73 ms   63 ms   79 ms
Min/Max Time    0 / 560 ms  0 / 0   0 / 400 ms  0 / 20 ms   0 / 340 ms  0 / 620 ms  0 / 620 ms
Avg Interval Time   2 min 55 sec 119 ms 12 min 32 sec 443 ms    12 min 32 sec 443 ms    12 min 32 sec 449 ms    12 min 32 sec 423 ms    13 sec 686 ms   51 sec 887 ms

GC Causes

Cause   Count   Avg Time    Max Time    Total Time  Time %
G1 Evacuation Pause     263 259 ms  560 ms  1 min 8 sec 50 ms   91.61%
GCLocker Initiated GC   15  272 ms  400 ms  4 sec 80 ms 5.49%
Others  12  n/a n/a 1 sec 250 ms    1.68%
G1 Humongous Allocation     3   300 ms  340 ms  900 ms  1.21%
Total   293 n/a n/a 1 min 14 sec 280 ms 99.99%

Tenuring summary

Desired Survivor Size: 448.0 mb,

Max Threshold: 15

Age Survival Count  Average size (kb)   Average Total 'To' size (kb)
age 1   281 54856.84    54856.84
age 2   273 32935.6 89227.65
age 3   258 29812.41    122175.68
age 4   235 28499.48    158266.46
age 5   214 27909.13    196528.23
age 6   192 26896.33    237892.45
age 7   180 25759.58    272516.81
age 8   174 23565.21    299092.37
age 9   166 21745.62    320927.73
age 10  149 19323.6 340228.24
age 11  125 17400.14    357569.6
age 12  96  13995.26    372030.12
age 13  55  10909.19    378053.14
age 14  38  10197.95    389146.13
age 15  22  5996.65 395657.37

Threewheeler answered 30/9, 2017 at 23:25 Comment(5)

Mmmm ... memory leak. You're filling up the old generation so G1 is struggling more and more to find places to put new things. – Biarritz 30/9, 2017 at 23:33

And ... if this is a memory leak, then fiddling with the GC configs will not solve the problem. – Confident 1/10, 2017 at 0:52

Probably related to https://mcmap.net/q/749913/-g1-gc-extremely-long-termination-time/1654233 – Tuyere 3/10, 2017 at 13:4

I would agree with the other guys that you might rather focus on why your application is so much dependent on the GC. Try to focus on what the GC must clean so often that it causes delays. Do you really close all the resources you use (connections, files, I/O streams, SQL statements, result sets, POI workbooks, ...)? Don't you have unnoticed autoboxing (damn the Java for this hideous feature) in 100_000 loops? Don't you forget active references on arrays or collections? – Jovanjove 5/10, 2017 at 7:56

I have a friend who had problem with placing all her stuff in her small flat. She moved into a bigger flat and within a month her new flat was crammed again :) – Jovanjove 5/10, 2017 at 8:1

First Point is

You need to check is there any connection leak in your application.

But there can be one parameter in G1GC, which you can analyze :

Humongous Objects

At this point the majority of the functionality and architecture of G1GC has been flushed out, with the exception of the biggest weakness/complexity, the Humongous object. As mentioned previously, any single data allocation ≥ G1HeapRegionSize/2 is considered a Humongous object, which is allocated out of contiguous regions of Free space, which are then added to Tenured. Let's run through some basic characteristics and how they affect the normal GC lifecycle. The following discussion on Humongous objects will provide insight into the downsides of Humongous objects such as:

Increase the risk of running out of Free space and triggering a Full GC
Increase overall time spent in STW

Humongous objects are allocated out of Free space. Allocation failures trigger GC events. If an allocation failure from Free space triggers GC, the GC event will be a Full GC, which is very undesirable in most circumstances. To avoid Full GC events in an application with lots of Humongous objects, one must ensure the Free space pool is large enough as compared to Eden that Eden will always fill up first. One usually ends up being over cautious and the application ends up in a state where the Free ram pool is quite large and never fully utilized, which is by definition wasting RAM.

Humongous objects are freed at the end of an MPCMC

Up until around Oracle JDK 8u45, it was true that Humongous objects were only collected at the end of runs of the MPCMC. The release notes for versions of Oracle 8u45-8u65 have a few commits indicating some, but not all, Humongous objects are being collected during Minor events.

So, You can try by updating latest java.

Humongous objects that are only collectible at the end of an MPCMC will increase the requirements for reserved Free space or be more likely to trigger a Full GC.

Finding How Much Humongous Object:

Step 1. : run following command on your gc.log

Command 1 :

grep "source: concurrent humongous allocation" /tmp/gc.log | sed 's/.*allocation request: $[0-9]*$ bytes.*/\1/' > humoungous_humongoud_size.txt

Command 2 :

awk -F',' '{sum+=$1} END{print sum;}' humoungous_humongoud_size.txt

It will give you the size of humongous objects generated in my application.

But Lastly, if your application has memory leaks you have to solve that.

Abbey answered 9/10, 2017 at 14:9 Comment(1)

Some times while analyzing heapdump in eclipse memory analyzer tool , it becomes hard to find the object which is large in size , in that case you can use jvisualvm , which shows per instance list which you can sort according to size . – Abbey 17/3, 2018 at 17:12

It's a young collection and almost everything is dieing young, so as opposed to the comments above this does not seem to be an issue with the old generation filling up.

[Ext Root Scanning (ms): Min: 0.0, Avg: 140.9, Max: 2478.3, Diff: 2478.3, Sum: 2818.8]

It's basically spending most of the time scanning GC roots and the other phases are then later held up waiting for this phase to finish.

Do you have a lot of threads (you only mention active ones)? Or is your application leaking classes or dynamically generating more and more bytecode?

The application is generating a lot of classes dynamically for each service call, and given the call volume, i suspect those classes might be an issue, but not sure how to resolve it.

You first have to figure out whether those generated classes get collected at all during old generation collections. If not you have a leak and need to fix your application. If they pile up but get collected eventually you only need to have the old generation collected more frequently, e.g. by decreasing the young generation size (which puts more pressure on the old gen) or by decreasing the IHOP.

Circumsolar answered 1/10, 2017 at 6:3 Comment(4)

All the application threads are active. The boxes are running behind a VIP with 48 connections, and the each box is running 50 threads. – Threewheeler 1/10, 2017 at 19:58

I added the tenuring summary in the question, and i see a fair distribution of objects moving to old generation. The application is generating a lot of classes dynamically for each service call, and given the call volume, i suspect those classes might be an issue, but not sure how to resolve it. – Threewheeler 1/10, 2017 at 20:4

you will want concurrent cycles to run more frequently, not less, so that the classes can be unloaded and thus make young collections cheaper. that is assuming they just get tenured and don't actually leak. also, the tenuring distribution does not contradict anything I've said, since most stuff dies in eden. – Circumsolar 2/10, 2017 at 18:40

No, decrease IHOP – Circumsolar 3/10, 2017 at 16:29

If I were in your position, this is what I would do.

Get the GC logs for a couple of days and load it to http://gceasy.io/ to assess how the memory grows.
Change the Garbage Collection mechanism from G1 to Parallel collector temporarily. I suggest going to parallel collector since it allocates the memory in a linear fashion and is relatively easy to check whether you have a memory leak. You also get a good comparison to G1. That doesn't mean you will have to permanently move to parallel, it is just for temporary comparison.
If the heap is growing continuously in a linear fashion without being garbage collected, then it is definitely a memory leak and you will have to find that.
If you can't see any evidence of memory leak, then you can think about tweaking the garbage collection settings.

Tweaking G1 garbage collector to suit your service is very important. G1 without any tunings might be very bad for some of the services like we had, which performed much worse than the parallel collector. But now with specific tunings, it works better now on our server which has 64 cores and 256 GB RAM.

Mckinley answered 4/10, 2017 at 9:57 Comment(0)

First of all, you have a big time spent on objects copying between generations.

G1 Evacuation Pause 263 259 ms 560 ms 1 min 8 sec 50 ms 91.61%

According to this

[Eden: 5512.0M(5512.0M)->0.0B(4444.0M) Survivors: 112.0M->128.0M Heap: 8222.2M(12.0G)->2707.5M(12.0G)]

[Object Copy (ms): Min: 0.0, Avg: 41.9, Max: 68.7, Diff: 68.7, Sum: 837.9]

[Update RS (ms): Min: 0.0, Avg: 5.3, Max: 41.9, Diff: 41.9, Sum: 106.9]

Ref Proc: 37.7 ms

[Ext Root Scanning (ms): Min: 0.0, Avg: 140.9, Max: 2478.3, Diff: 2478.3, Sum: 2818.8]

This all about managing objects between regions. Looks like you have a lot of short time living objects, but spend a lot of time on managing objects between regions. Try to play with Young Gen size. From one side you could increase it, so you will spend less time with objects copying. It could increase the time for roots analysis too. If most of the objects are dead it would be ok. But if not, you should conversely decrease Young Size, then G1 would be called more frequently but spends less time for one invocation, this provides more predictive behavior and latency. As we can see the biggest time is spent on roots analysis, I would start with decrease Young Gen size to 3GB to see what happens.

Second is

Termination (ms): Min: 0.0, Avg: 2282.3, Max: 2415.8, Diff: 2415.8, Sum: 45645.3]

Your G1 spends a lot of time on termination process, where it tries finish thread work, but before that, it checks all queues and steals tasks. You could see that there are a lot of attempts of stealing:

Termination Attempts: Min: 1, Avg: 21.5, Max: 68, Diff: 67, Sum: 430

21.5 attempts per one worker thread it is not a bit. If worker thread successfully stole tasks it would be continued. You have big concurrency here and I suppose you could decrease the number of GC threads.

p.s. To choose GC you should use:

Parallel GC, if not appropriate then...
G1, if not appropriate then...
tuned CMS with ParNew.

If I were you, I would be using CMS with ParNew on your place to provide good latency for this scenario.

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++