Why does G1GC shrink the young generation before starting mixed collections?

Asked 31/3, 2016 at 23:14 Answered 26/2, 2019 at 9:16

java garbage-collection heap-memory g1gc

When G1 decides it needs to start doing mixed collections, it aggressively shrinks our Eden space from 10g to about 1g.

{Heap before GC invocations=294 (full 0):
 garbage-first heap   total 20480000K, used 18304860K [0x00000002de000000, 0x00000002de804e20, 0x00000007c0000000)
  region size 8192K, 1363 young (11165696K), 11 survivors (90112K)
 Metaspace       used 37327K, capacity 37826K, committed 38096K, reserved 1083392K
  class space    used 3935K, capacity 4081K, committed 4096K, reserved 1048576K
2016-03-31T20:57:31.002+0000: 7196.427: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 717225984 bytes, new threshold 1 (max 1)
- age   1:   41346816 bytes,   41346816 total
 7196.427: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 144693, predicted base time: 48.88 ms, remaining time: 951.12 ms, target pause time: 1000.00 ms]
 7196.427: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1352 regions, survivors: 11 regions, predicted young region time: 20.72 ms]
 7196.427: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1352 regions, survivors: 11 regions, old: 0 regions, predicted pause time: 69.60 ms, target pause time: 1000.00 ms]
 7196.494: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 789 regions, reclaimable: 4703761904 bytes (22.43 %), threshold: 5.00 %]
, 0.0673540 secs]
   [Parallel Time: 60.1 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 7196427.8, Avg: 7196428.1, Max: 7196428.2, Diff: 0.4]
      [Ext Root Scanning (ms): Min: 7.3, Avg: 7.5, Max: 7.7, Diff: 0.4, Sum: 134.2]
      [Update RS (ms): Min: 28.2, Avg: 28.8, Max: 29.9, Diff: 1.7, Sum: 518.4]
         [Processed Buffers: Min: 41, Avg: 57.7, Max: 80, Diff: 39, Sum: 1039]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.5, Diff: 0.4, Sum: 3.7]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [Object Copy (ms): Min: 22.1, Avg: 23.1, Max: 23.4, Diff: 1.3, Sum: 416.2]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 18]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 2.5]
      [GC Worker Total (ms): Min: 59.5, Avg: 59.7, Max: 60.0, Diff: 0.5, Sum: 1075.1]
      [GC Worker End (ms): Min: 7196487.7, Avg: 7196487.8, Max: 7196487.9, Diff: 0.2]
   [Code Root Fixup: 0.2 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 1.9 ms]
   [Other: 5.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.5 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.5 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 2.3 ms]
   [Eden: 10.6G(10.6G)->0.0B(848.0M) Survivors: 88.0M->152.0M Heap: 17.5G(19.5G)->7128.3M(19.5G)]
Heap after GC invocations=295 (full 0):
 garbage-first heap   total 20480000K, used 7299344K [0x00000002de000000, 0x00000002de804e20, 0x00000007c0000000)
  region size 8192K, 19 young (155648K), 19 survivors (155648K)
 Metaspace       used 37327K, capacity 37826K, committed 38096K, reserved 1083392K
  class space    used 3935K, capacity 4081K, committed 4096K, reserved 1048576K
}
 [Times: user=1.09 sys=0.00, real=0.07 secs]
2016-03-31T20:57:31.070+0000: 7196.495: Total time for which application threads were stopped: 0.0699324 seconds, Stopping threads took: 0.0003462 seconds

This is after it's been running with 10-11g of Eden for 60 or more collections.

Here are the appropriate JVM GC parameters we're running with

-Xms20000m -Xmx20000m
-XX:+UseG1GC 
-XX:G1RSetUpdatingPauseTimePercent=5 
-XX:MaxGCPauseMillis=1000 
-XX:GCTimeRatio=99 
-XX:InitiatingHeapOccupancyPercent=35 
-XX:MaxTenuringThreshold=1 
-XX:G1ConcRefinementThreads=6 
-XX:ConcGCThreads=18 
-XX:ParallelGCThreads=18

-XX:+PrintGCDetails"
-XX:+PrintGCDateStamps"
-XX:+PrintHeapAtGC"
-XX:+PrintTenuringDistribution"
-XX:+PrintGCApplicationStoppedTime"
-XX:+PrintPromotionFailure"
-XX:+PrintAdaptiveSizePolicy"

According to page 55 of this presentation, it needs to resize Eden so that max pause target accounts for the entire heap, not just to the new generation. Why is the collector being so aggressive?

Average young generation pause times are between 50-150ms for a heap size of 10g. If the presentation is correct (I haven't found anything else to support the statement), I would expect shrinking by half (20g heap), not 10x.

Ryter answered 31/3, 2016 at 23:14 Comment(7)

I've just read through that presentation (it's quite good really), but I don't see the same things on slide 55 that you mention in your post. And, it shows an example of eden reducing from 12.5gb to <1gb, which is rather similar to yours. Can you explain in more detail why you expect eden to shrink less than it has? – Squelch 1/4, 2016 at 0:3

And, maybe consider adding the print-adaptive-size-policy flag. – Squelch 1/4, 2016 at 0:11

@EngineerDollery The PrintAdaptiveSizePolicy parameter is enabled (added to question). In the presentation, issue #3, it states "Must account for old regions, not only young, Currently G1 shrinks the young generation". Ideally, I wouldn't expect the generation to shrink at all. We've set our pause goal to 1000ms and collections are much quicker than that. Having to collect twice the garbage (which is already marked) shouldn't take more than twice the time. – Ryter 1/4, 2016 at 0:31

you're putting lots of constraints on G1. are you certain that each parameter is justified as it is right now? And if the new gen shrinking really is the concern, why not set a minimum size for it? – Cargill 1/4, 2016 at 5:4

@Cargill We can have that discussion in chat (though you'd have to create it, I don't have enough reputation). – Ryter 1/4, 2016 at 14:44

I agree with the8472. G1GC works well when we did not modify most of the default values except key parameters like pause time goal , region size, GC threds etc. Refer to #8111810 and oracle.com/technetwork/articles/java/g1gc-1984535.html – Jerid 5/4, 2016 at 9:44

Pillar, respecting maximum pause time goal should be the reason for aggressiveness of algorithm – Jerid 5/4, 2016 at 9:55

Young generation can shrink by a factor of 20X. So shrinking by a factory of 10X is not a surprise.

Since you have not set young gen size explicitly, default value is

-XX:G1NewSizePercent=5

Sets the percentage of the heap to use as the minimum for the young generation size.

So to respect your pause time goal of

-XX:MaxGCPauseMillis=1000

the young gen can shrink up-to 5% of total heap.

Note: G1NewSizePercent, and G1MaxNewSizePercent are not to be confused with NewSize or MaxNewSize.

G1NewSizePercent and G1MaxNewSizePercent put a lower and upper bound respectively YoungGen sizing

On a different note, you have configured many parameters which may be un-necessary sine G1GC works fine if most of the default parameters have been set to default values. Refer to this SE question for more details.

Java 7 (JDK 7) garbage collection and documentation on G1

In summary: Depending on pause time goal, young gen will shrink. If you are really worried about shrinking of young gen to low value, configure -XX:G1NewSizePercent. But I won't recommend it as long as -XX:MaxGCPauseMillis has been met

Jerid answered 5/4, 2016 at 14:15 Comment(10)

So your answer is that it does because it can? That doesn't satisfy me and you currently haven't supported it with any documentation. As I've shown from the printed GC details, the pause time is already well within the max pause time goal. There's no reason to resize so much. Resizing so much causes more strain on the application. Currently, this application can generate 10g of garbage in 2-3 seconds. With a new heap of 800m, the JVM will be gc'ing every few hundred milliseconds. This negatively affects throughput. – Ryter 10/4, 2016 at 9:4

From oracle documentation page: The G1 GC has a pause time-target that it tries to meet (soft real time). During young collections, the G1 GC adjusts its young generation (eden and survivor sizes) to meet the soft real-time target. During mixed collections, the G1 GC adjusts the number of old regions that are collected based on a target number of mixed garbage collections, the percentage of live objects in each region of the heap, and the overall acceptable heap waste percentage. – Jerid 10/4, 2016 at 9:27

This article and infoq articles are provide proof. – Jerid 10/4, 2016 at 9:35

My point is that the JVM should have met the pause time target without resizing the young gen. It's being too aggressive and I want to know why. Your answer doesn't address that, the aggressiveness. – Ryter 10/4, 2016 at 16:42

Young/mixed gc runs with stop the world processor. Keeping it low helps to achieve pause time goal. – Jerid 12/4, 2016 at 4:28

Except now, because the young gen is so small (during mixed collections), the JVM is gc'ing almost constantly to try and keep up with the garbage. Because of that, the JVM fails to me the GCTimeRatio. The JVM was already meeting the pause time goal, there was no reason for it to resize. – Ryter 12/4, 2016 at 4:56

GCTimeRatio and less young gen are not related. What is the failure in your case for GCTimeRatio? you are giving only 1% of time for GC instead of default value of 10%. – Jerid 12/4, 2016 at 5:37

Yeah, I only want 1% GC time. And, under normal circumstances, that's satisfied. (It's actually more like 0.6% on average.) But when young gen is resized, GC time increases because the JVM gc's much more often. If the JVM didn't resize, both the pause time goal AND the gc time ratio would be satisfied. For some reason, it's being overly aggressive with resizing and that ruins the GCTimeRatio. IMO it's not adapting efficiently. – Ryter 12/4, 2016 at 5:42

Then try with modifying -XX:G1NewSizePercent to a high value instead of 5 so that it won't re-size up to that value ( say 10%-15%). But current algorithm is designed to re-size even in healthy state as per oracle ergonomics page. – Jerid 12/4, 2016 at 5:46

Yes, that's good advice, and I've already done so. My question, which I keep coming back to, is why the current implementation of the collector is being so aggressive. The quote from your latest edit is interesting, but not quite as specific as I want. – Ryter 12/4, 2016 at 5:47

With MaxTenuringThreshold=1 you are telling the GC to promoted an object to the Old Generation after surviving only 1 GC cycle. In most of the application this will lead to many premature promotions. If too many objects are promoted too early your Old Gen will grow rapidly thus it has to shrink the Youg Gen to make space for the Old Gen

Schuh answered 26/2, 2019 at 9:16 Comment(0)

Recommended topics

Hot tags