JVM G1GC's mixed gc not collecting much old regions
Asked Answered
J

1

35

My server is using 1.8.0_92 on CentOS 6.7, GC param is '-Xms16g -Xmx16g -XX:+UseG1GC'. So the default InitiatingHeapOccupancyPercent is 45, G1HeapWastePercent is 5 and G1MixedGCLiveThresholdPercent is 85. My server's mixed GC starts from 7.2GB, but it clean less and less, finally old gen keeps larger than 7.2GB, so it's always try to do concurrent mark. Finally all heap are exhausted and full GC occurred. After full GC, old gen used is under 500MB.

old gen

I'm curious why my mixed GC can't collect more, looks like live data is not so much...

I have tried printing g1 related info, and found many messages like below, looks like my old gen contains much live data, but why full GC can collect so much...

G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 190 regions, reclaimable: 856223240 bytes (4.98 %),  threshold: 5.00 %

Below log is the result of modifying InitiatingHeapOccupancyPercent to 15(start concurrent mark at 2.4GB) to speed up.

### PHASE Post-Marking
......
### SUMMARY  capacity: 16384.00 MB  used: 2918.42 MB / 17.81 %  prev-live: 2407.92 MB / 14.70 %  next-live: 2395.00 MB / 14.62 %  remset: 56.66 MB  code-roots: 0.91 MB
### PHASE Post-Sorting
....
### SUMMARY  capacity: 1624.00 MB  used: 1624.00 MB / 100.00 %  prev-live: 1123.70 MB / 69.19 %  next-live: 0.00 MB / 0.00 %  remset: 35.90 MB  code-roots: 0.89 MB

EDIT:

I try to trigger full GC after mixed GC, it still can reduce to 4xx MB, so looks like my old gen has more data can be collected.

enter image description here

before full gc, the mixed gc log is

 32654.979: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 457 regions, reclaimable: 2956666176 bytes (17.21 %), threshold: 5.00 %], 0.1106810 secs]
 ....
 [Eden: 6680.0M(6680.0M)->0.0B(536.0M) Survivors: 344.0M->280.0M Heap: 14.0G(16.0G)->7606.6M(16.0G)]
 [Times: user=2.31 sys=0.01, real=0.11 secs]
 ...
 [GC pause (G1 Evacuation Pause) (mixed)
 ...
 32656.876: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 205 regions, max: 205 regions]
 32656.876: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 67 regions, survivors: 35 regions, old: 205 regions, predicted pause time: 173.84 ms, target pause time: 200.00 ms]
 32656.992: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 252 regions, reclaimable: 1321193600 bytes (7.69 %), threshold: 5.00 %]
 [Eden: 536.0M(536.0M)->0.0B(720.0M) Survivors: 280.0M->96.0M Heap: 8142.6M(16.0G)->6029.9M(16.0G)]
 [Times: user=2.49 sys=0.01, real=0.12 secs]
 ...
 [GC pause (G1 Evacuation Pause) (mixed)
 ...
 32659.727: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 66 regions, max: 205 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %]
 32659.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 90 regions, survivors: 12 regions, old: 66 regions, predicted pause time: 120.51 ms, target pause time: 200.00 ms]
 32659.785: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 186 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %]
 [Eden: 720.0M(720.0M)->0.0B(9064.0M) Survivors: 96.0M->64.0M Heap: 6749.9M(16.0G)->5572.0M(16.0G)]
 [Times: user=1.20 sys=0.00, real=0.06 secs]

EDIT: 2016/12/11

I have dumped the heap from another machine with -Xmx4G.

I used lettuce as my redis client, and it has tracking featured using LatencyUtils. It make LatencyStats(which contains some long[] with near 3000 elements) instances weak referenced every 10 mins(Reset latencies after publish is true by default, https://github.com/mp911de/lettuce/wiki/Command-Latency-Metrics). So it will make lots of weak reference of LatencyStats after long time.

Before Full GC. afterFullGc

afterFullGc

afterFullGc

After Full GC. afterFullGc

Currently I don't need tracking from lettuce, so just disable it and it doesn't have full GC anymore. But not sure why mixed gc doesn't clear them.

Jackboot answered 18/11, 2016 at 15:8 Comment(9)
-XX:+UnlockDiagnosticVMOptions -XX:+G1PrintHeapRegions -XX:+G1PrintRegionLivenessInfo might provide some insight. I guess you got humongous allocations or some allocation patterns that somehow lead to incorrect liveness estimates (soft refs maybe)?Dercy
Thanks, ### PHASE Post-Marking is G1PrintRegionLivenessInfo, looks like it still has more than 1GB live data. And humongous doesn't show many [Humongous Total: 1] [Humongous Candidate: 1]. I will check how to know if its from soft refs..(any doc about this?)Jackboot
I will try SoftRefLRUPolicyMSPerMB..Jackboot
hmm not works after adding SoftRefLRUPolicyMSPerMBJackboot
then you should upload the gc log somewhere. or try lowering the G1MixedGCLiveThresholdPercentDercy
doesn't lowering make it worse? because the candidates will be less.Jackboot
err, yeah. anyway, I meant uploading the complete GC log somewhere, not just excerpts. more information = less guessing. add -XX:+PrintGCDetails if you haven't already.Dercy
Have you looked at using the parallel collector instead? Your heap size is boardline to where the parallel collector might be able to meet your response time requirements even when doing major collections. 200ms is a lot of time.Langer
Could you first upgrade to latest 1.8.0? GCs are subject to changes in JRE updates.Satirist
K
1

well, you didnt mentioned all arguments you set, but

you could try to set

-XX:+ScavengeBeforeFullGC

and you should also think about your Objects lifecycle. how long do your applications Objects live and what size are the Objects.

think about it and take a look at the following arguments

-XX:NewRatio=n              old/new ration (default 2)
-XX:SurvivorRatio=n         eden/survivor ratio (default 8)
-XX:MaxTenuringThreshold=n  number of times, objects are moved from survivor one to survivor two and vice versa before objects are moved to old-gen (default 15)

with default values Xms and Xmx are set to 32gb -> old gen = 16gb and new gen 16gb -> eden 14gb -> survivors 2gb (there are two, each of it at the size of 1gb)

eden contains all Objects that are instantiated by new Object.

one survivor (to-survivor) is always empty. the other ones (from-survivor) contains Objects that survived a minor gc

surviving Objects from eden and from from-survivor go into to-survivor at minor gc

if the standard-size of 1gb of this 'default-configuration' exceeds, Objects go into old-gen

if it does not exceed, after 15 minor gc's (-XX:MaxTenuringThresholds default value), Objects go into old-gen

by tweaking those values, always keep in mind, old-gen has to be as large as or larger than new-gen, cause a gc can cause the whole new-gen to go into old-gen

Edit

a timeline of your first "old gen: used" picture would be helpful

keep in mind that there is no need to do a full gc until old gen doesn't exceed - a full gc causes the whole "world" to stop for a certain period of time

in this particular case, i would say you could

  1. reduce -Xms and -Xmx to 8gb
  2. set/decrease -XX:SurvivorRatios value to 2
  3. set/increase -XX:MaxTenuringThreshold to 50

and you will get an old and new gen, each sized 4gb,

eden at size of 2gb,

two survivors, each sized 1gb,

and aproximately 50 minor gc's, before Objects go into old gen

Kaspar answered 22/6, 2019 at 19:43 Comment(1)
-XX:NewRatio=n old/new ration (default 2) with default values Xms and Xmx are set to 32gb -> old gen = 22gb and new gen 11g approximatelyElephant

© 2022 - 2024 — McMap. All rights reserved.