G1GC very high GC count and CPU, very frequency GCs that kill performance

Asked 13/2, 2018 at 4:47 Answered 16/2, 2018 at 0:24

I've recently switched my Java application from CMS + ParNew to G1GC. What I observed when I did the switch is the CPU usage went higher and the GC count + pause time went up as well. My JVM flags before the switched were

 java -Xmx22467m -Xms22467m -XX:NewSize=11233m -XX:+UseConcMarkSweepGC -XX:AutoBoxCacheMax=1048576 -jar my-application.jar

After the switch my flags are:

java -Xmx22467m -Xms22467m -XX:+G1GC -XX:AutoBoxCacheMax=1048576 -XX:MaxGCPauseMillis=30 -jar my-application.jar

I followed Oracle's Best Practices http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

Do not Set Young Generation Size

And did not set the young generation size. However I am suspecting that the young generation size is the problem here. What I see is the heap usage is fluctuating between ~6 - 8 GB.

Whereas before, with CMS and Par New there the memory usage grew between 4-16 GB and only then I saw a GC:

I am not sure I understand why with G1GC the GC is so frequent. I am not sure what I'm missing when it comes to GC tuning with G1GC.

I'm using Java 8 : ava version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

I appreciate your help.

UPDATE: A bit more information about those pauses: As you can see all those pauses are G1New, and seemingly they are as long as my target pause time, which is 30ms. When I look at the ParNew pauses before the switch to G1GC, this is how it looked like: So they are also all young gen collections (ParNew) but they are less frequent and shorter, because they happen only when the heap usage gets to around 14GB (according to the graph)

I am still clueless why the G1New happen so early (in terms of heap usage)

Update 2 I also noticed that NewRatio=2, I don't know if G1GC is respecting that, but that would mean that my New Gen is capped at 7GB. Could that be the reason?

Update 3 Adding G1GC GC logs: https://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp=sharing

Deterioration answered 13/2, 2018 at 4:47 Comment(11)

Ok. Set the young generation size to 16 GB and see if that makes a difference. You know your application better then we do. – Declaratory 13/2, 2018 at 4:49

It looks the app makes a lot of short lived garbage that is easy to dispose of. G1 should be a good fit. Should be able to set XX:InitiatingHeapOccupancyPercent to be pretty high- 75 or 80, which should make the GC graph look more like the CMS graph. – Backgammon 13/2, 2018 at 5:28

@JonahB all the GC cycles that you see in the image are G1New. Please correct me if I misunderstood the XX:InitiatingHeapOccupancyPercent meaning. Isn't a Concurrent Marking Cycle part of the old generation collection? (looking here oracle.com/technetwork/tutorials/tutorials-1876574.html at G1 Collection Phases - Concurrent Marking Cycle Phases section). If so how would raising that value prevent the G1New cycles? – Deterioration 13/2, 2018 at 5:55

@ElliottFrisch, thanks for the reply. According to Oracle, if I set the young generation size, G1GC will not respect the target pause time. "Explicitly setting young generation size via -Xmn meddles with the default behavior of the G1 collector. G1 will no longer respect the pause time target for collections. So in essence, setting the young generation size disables the pause time goal. G1 is no longer able to expand and contract the young generation space as needed. Since the size is fixed, no changes can be made to the size" – Deterioration 13/2, 2018 at 5:59

Did you try what happens, if you don’t specify MaxGCPauseMillis? – Awash 13/2, 2018 at 8:22

Yes, IHOP instructs GC for old gen. The pauses in the graph are very long for such a small collection. Something going wrong there. If you try setting large young gen size, you may get the same behavior as CMS but- without knowing the app- I would guess you will see longer pauses. If you were seeing numerous but much shorter collections under G1, that would be a reason to grow young gen. IHOP is there to guide GC around how quickly garbage is generated, relative to how quickly it can be cleaned. Increasing should allow GC to push work to old gen, and lengthen the time between collections. – Backgammon 13/2, 2018 at 15:8

@JonahB please see my update. As you can see all those pauses are as long as the target pause time and they are all new. – Deterioration 13/2, 2018 at 17:9

You might also want to try setting -Xms to a value lower than -Xmx, that gives the tuning heuristics in the VM some room to play. – Cycad 13/2, 2018 at 17:12

Those screenshots only graph 3 variables. Enable detailed GC logging, post logs, those track dozens of stats of interest. – Popover 13/2, 2018 at 19:32

@Popover I added GC logs, please see the update – Deterioration 13/2, 2018 at 19:56

As your pause time is very low for a 22GB heap, the G1 tries to reduce the Young Generation till it can assure that the pause time is met. According to the GC log, just half of the available Memory is really used. Have you tried to reduce the -Xmx and -Xms to around 12GB? – Pittance 5/3, 2018 at 21:52

I was able to see that the time spent in copying objects is very significant. Looks like G1GC has 15 generations by default before the object is promoted to Tenured Generation. I reduced it to 1 (-XX:MaxTenuringThreshold=1)

Also I don't know how to confirm it in the logs, however visualizing the GC log I saw that the young generation is constantly being resized, from minimum size to maximum size. I narrowed down the range and that also improved the performance.

Looking here https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9 I was trying to figure out if coarsenings is indeed an issue. But it simply says to set gc+remset=trace which I do not understand how to pass to java in command line, and if it's even available in JDK 8. I increased the XX:G1RSetRegionEntries a bit just in case.

I hope it helps to the future G1GC tuner and if anyone else has more suggestions that would be great.

What I still see is that [Processed Buffers] is still taking a very long time in young evacuations, and [Scan RS] is very long in mixed collections. Not sure why

Deterioration answered 16/2, 2018 at 0:24 Comment(0)

Your GC log shows an average GC pause interval of 2 seconds with each around 30-40ms, which amounts to an application throughput of around 95%. That does not amount to "killing performance" territory. At least not due to GC pauses.

G1 does more concurrent work though, e.g. for remembered set refinement and your pauses seem to spend some time in update/scan RS, so I assume the concurrent GC threads are busy too, i.e. it may need additional CPU cycles outside GC pauses, which is not covered by the logs by default, you need +G1SummarizeRSetStats for that. If latency is more important you might want to allocated more cores to the machine, if throughput is more important you could tune G1 to perform more of the RS updates during the pauses (at the cost of increased pause times).

Popover answered 14/2, 2018 at 1:8 Comment(2)

I added +G1SummarizeRSetStats. What I see is Recent concurrent refinement statistics Processed 39371 cards Of 165 completed buffers: 165 (100.0%) by concurrent RS threads. 0 ( 0.0%) by mutator threads. And the Rset statistics. what I don't understand is why Processed Buffers is taking so much time, when the -XX:G1RSetUpdatingPauseTimePercent=10 by default. If my target GC pause time is 40ms then updating RSet should take no longer than 0.1*40= 4ms. – Deterioration 14/2, 2018 at 23:5

Comments are not suitable for that stuff – Popover 15/2, 2018 at 22:2

Recommended topics

Hot tags