I have a strange issue with one of my production machines. It hosts a Java application that does CMS (concurrent mark and sweep), but it cleans up just a small part of the old generation. I suspected a memory leak and tried a heap dump. But the Full GC preceding the heap dump cleans up almost all of the old generation. What's happening? I've never seen this behavior of Java garbage collection. Normally CMS and Full GC should collect about the same amount of garbage, now CMS keeps about 10GB more.
- Java 1.7.0_75
- Linux Cent OS 7
GC logs:
**2016-01-04T07:37:40.196+0000: 431200.698: [GC [1 CMS-initial-mark: 21633423K(27336704K)] 22826703K(30101504K), 4.3910840 secs] [Times: user=4.40 sys=0.01, real=4.39 secs]
2016-01-04T07:37:44.588+0000: 431205.090: [CMS-concurrent-mark-start]
2016-01-04T07:38:08.718+0000: 431229.220: [CMS-concurrent-mark: 18.213/24.131 secs] [Times: user=126.00 sys=2.22, real=24.13 secs]
2016-01-04T07:38:08.718+0000: 431229.220: [CMS-concurrent-preclean-start]
2016-01-04T07:38:08.843+0000: 431229.345: [CMS-concurrent-preclean: 0.118/0.125 secs] [Times: user=0.29 sys=0.00, real=0.12 secs]
2016-01-04T07:38:08.843+0000: 431229.345: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2016-01-04T07:38:13.906+0000: 431234.408: [CMS-concurrent-abortable-preclean: 4.975/5.063 secs] [Times: user=10.18 sys=0.03, real=5.06 secs]
2016-01-04T07:38:14.001+0000: 431234.503: [GC[YG occupancy: 1312993 K (2764800 K)]2016-01-04T07:38:14.001+0000: 431234.503: [Rescan (parallel) , 4.6981760 secs]2016-01-04T07:38:18.699+0000: 431239.202: [weak refs processing, 0.0002970 secs]2016-01-04T07:38:18.700+0000: 431239.202: [scrub string table, 0.0006900 secs] [1 CMS-remark: 21633423K(27336704K)] 22946417K(30101504K), 4.6993310 secs] [Times: user=105.40 sys=0.36, real=4.70 secs]
2016-01-04T07:38:18.701+0000: 431239.203: [CMS-concurrent-sweep-start]
2016-01-04T07:38:27.967+0000: 431248.469: [CMS-concurrent-sweep: 9.160/9.267 secs] [Times: user=17.91 sys=0.10, real=9.26 secs]
2016-01-04T07:38:27.968+0000: 431248.470: [CMS-concurrent-reset-start]
2016-01-04T07:38:28.028+0000: 431248.531: [CMS-concurrent-reset: 0.061/0.061 secs] [Times: user=0.14 sys=0.00, real=0.06 secs]**
2016-01-04T07:38:30.801+0000: 431251.303: [GC [1 CMS-initial-mark: 21633105K(27336704K)] 23039228K(30101504K), 5.6079370 secs] [Times: user=5.60 sys=0.01, real=5.61 secs]
2016-01-04T07:38:36.409+0000: 431256.911: [CMS-concurrent-mark-start]
2016-01-04T07:38:54.673+0000: 431275.175: [CMS-concurrent-mark: 17.807/18.264 secs] [Times: user=119.97 sys=1.66, real=18.26 secs]
2016-01-04T07:38:54.673+0000: 431275.175: [CMS-concurrent-preclean-start]
2016-01-04T07:38:54.799+0000: 431275.301: [CMS-concurrent-preclean: 0.119/0.126 secs] [Times: user=0.25 sys=0.00, real=0.13 secs]
2016-01-04T07:38:54.799+0000: 431275.301: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2016-01-04T07:39:01.382+0000: 431281.884: [CMS-concurrent-abortable-preclean: 6.478/6.583 secs] [Times: user=12.23 sys=0.06, real=6.58 secs]
2016-01-04T07:39:01.487+0000: 431281.989: [GC[YG occupancy: 1596183 K (2764800 K)]2016-01-04T07:39:01.487+0000: 431281.989: [Rescan (parallel) , 3.5737630 secs]2016-01-04T07:39:05.061+0000: 431285.563: [weak refs processing, 0.0002690 secs]2016-01-04T07:39:05.061+0000: 431285.563: [scrub string table, 0.0005740 secs] [1 CMS-remark: 21633105K(27336704K)] 23229288K(30101504K), 3.5747910 secs] [Times: user=80.26 sys=0.27, real=3.58 secs]
2016-01-04T07:39:05.062+0000: 431285.564: [CMS-concurrent-sweep-start]
2016-01-04T07:39:21.895+0000: 431302.397: [CMS-concurrent-sweep: 9.449/16.834 secs] [Times: user=27.52 sys=0.16, real=16.83 secs]
2016-01-04T07:39:21.895+0000: 431302.397: [CMS-concurrent-reset-start]
2016-01-04T07:39:21.995+0000: 431302.497: [CMS-concurrent-reset: 0.099/0.099 secs] [Times: user=0.23 sys=0.01, real=0.10 secs]
2016-01-04T07:39:24.104+0000: 431304.606: [GC [1 CMS-initial-mark: **21631742K(27336704K)**] 23455261K(30101504K), 5.6592940 secs] [Times: user=5.67 sys=0.00, real=5.66 secs]
2016-01-04T07:39:29.764+0000: 431310.266: [CMS-concurrent-mark-start]
2016-01-04T07:39:56.859+0000: 431337.361: [CMS-concurrent-mark: 19.813/27.096 secs] [Times: user=140.17 sys=2.62, real=27.10 secs]
2016-01-04T07:39:56.859+0000: 431337.361: [CMS-concurrent-preclean-start]
2016-01-04T07:39:57.102+0000: 431337.604: [CMS-concurrent-preclean: 0.152/0.242 secs] [Times: user=0.35 sys=0.00, real=0.24 secs]
2016-01-04T07:39:57.102+0000: 431337.604: [CMS-concurrent-abortable-preclean-start]
CMS: abort preclean due to time 2016-01-04T07:40:03.016+0000: 431343.518: [CMS-concurrent-abortable-preclean: 5.376/5.914 secs] [Times: user=11.65 sys=0.05, real=5.91 secs]
2016-01-04T07:40:03.016+0000: 431343.518: [GC[YG occupancy: 2045175 K (2764800 K)]2016-01-04T07:40:03.016+0000: 431343.518: [Rescan (parallel) , 3.4843400 secs]2016-01-04T07:40:06.501+0000: 431347.003: [weak refs processing, 0.0002510 secs]2016-01-04T07:40:06.501+0000: 431347.003: [scrub string table, 0.0006220 secs] [1 CMS-remark: **21631742K(27336704K)**] 23676918K(30101504K), 3.4853760 secs] [Times: user=78.31 sys=0.27, real=3.49 secs]
2016-01-04T07:40:06.502+0000: 431347.004: [CMS-concurrent-sweep-start]
{Heap before GC invocations=1832 (full 5077):
par new generation total 2764800K, used 2166647K [0x00000000bae00000, 0x0000000176600000, 0x0000000176600000)
eden space 2457600K, 77% used [0x00000000bae00000, 0x000000012e6a4c90, 0x0000000150e00000)
from space 307200K, 89% used [0x0000000150e00000, 0x0000000161939318, 0x0000000163a00000)
to space 307200K, 0% used [0x0000000163a00000, 0x0000000163a00000, 0x0000000176600000)
concurrent mark-sweep generation total 27336704K, used 21631644K [0x0000000176600000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 49956K, used 30116K [0x00000007fae00000, 0x00000007fdec9000, 0x0000000800000000)
**2016-01-04T07:40:12.775+0000: 431353.277: [Full GC2016-01-04T07:40:12.775+0000: 431353.277: [CMS2016-01-04T07:40:17.924+0000: 431358.426: [CMS-concurrent-sweep: 9.211/11.422 secs] [Times: user=16.59 sys=0.15, real=11.42 secs]
(concurrent mode interrupted): 21631644K->4907878K(27336704K), 39.2467600 secs] 23798292K->4907878K(30101504K), [CMS Perm : 30116K->28023K(49956K)], 39.2468730 secs] [Times: user=39.24 sys=0.05, real=39.25 secs]**
The same application is running normally on another machine using Cent OS 5, java 7.
java -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1581
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -verbose:gc -XX:+PrintGCDetails
-XX:+UseCompressedOops -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC
-Xloggc:/usr/local/app/logs/current-gc.log -Xms29g -Xmx29g -XX:NewSize=3000m
-XX:SurvivorRatio=8 -XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly
-Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFFE
-Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFFE
-cp /usr/local/app/conf:/usr/local/app/app.jar:/usr/local/app/lib/* -Xdebug
-Xrunjdwp:transport=dt_socket,address=8099,server=y,suspend=n
-Dvisualvm.display.name=App -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/usr/local/app/logs/
-XX:ErrorFile=/usr/local/app/logs/hs_err_pid%p.log
-Djgroups.tcpgossip.initial_host=localhost
-Dlog4j.hostname=host7.company.com com.company.app.service.App
UPDATE: The problem is still not solved. I've tried everything: update OS packages and kernel, update Java to latest version Java 1.7.0_80, rollback the application version, but without success.
I've also verified previous GC logs and found that this issue was not forever. It started about a month ago, after a deployment.