We noticed occasional full GC’s with G1 garbage collector with concurrent-mark overflow. Once, there is a concurrent-mark-reset-for-overflow, this overflow will continue in the next concurrent mark phases. Eventually, it leads to the full GC since the concurrent mark seems no longer working.
We have four machines running the same Apache Storm based application with the same data traffic. Only one of the machines has this experience once in a week.
Is this related to the bug: ‘G1 does not expand marking stack when mark stack overflow happens during concurrent marking’ https://bugs.openjdk.java.net/browse/JDK-8065402
According to the suggestion from the above page, we doubled the concurrent mark threads from 4 to 8 and our heap size from 8GB to 16GB. However, the full GC still happens and the only difference is that the occurrences are delayed.
Any other suggestions?
Here's the GC log:
Java HotSpot(TM) 64-Bit Server VM (25.65-b01) for linux-amd64 JRE(1.8.0_65b17),
built on Oct 6 2015 17:16:12 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 529167668k(69283408k free), swap 33554424k(33552380k free)
CommandLine flags: -XX:ConcGCThreads=8 -XX:G1ReservePercent=20 -XX:GCLogFileSize=104857600
-XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=45 -XX:MaxGCPauseMillis=100
-XX:MaxHeapSize=17179869184 -XX:NumberOfGCLogFiles=10 -XX:ParallelGCThreads=30
-XX:+PrintAdaptiveSizePolicy -XX:PrintFLSStatistics=2 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
...
...
2016-04-13T22:06:37.254-0400: 19839.175: [GC concurrent-root-region-scan-start]
2016-04-13T22:06:37.313-0400: 19839.234: [GC concurrent-root-region-scan-end, 0.0592966 secs]
2016-04-13T22:06:37.313-0400: 19839.234: [GC concurrent-mark-start]
2016-04-13T22:06:38.569-0400: 19840.490: [GC concurrent-mark-reset-for-overflow]
...
2016-04-13T22:06:42.810-0400: 19844.731: [GC concurrent-mark-reset-for-overflow]
...
2016-04-13T22:11:19.253-0400: 20121.175: [GC concurrent-mark-reset-for-overflow]
...
...
...
2016-04-14T01:58:17.254-0400: 33739.176: [GC concurrent-mark-reset-for-overflow]
...
2016-04-14T01:58:36.957-0400: 33758.878: [Full GC (Allocation Failure)