Java G1 garbage collection in production

N

16

92

Since Java 7 is going to use the new G1 garbage collection by default is Java going to be able to handle an order of magnitude larger heap without supposed "devastating" GC pause times? Has anybody actually implemented G1 in production, what were your experiences?

To be fair the only time I have seen really long GC pauses is on very large heaps, much more than a workstation would have. To clarify my question; will G1 open the gateway to heaps in the hundreds of GB? TB?

Nieberg answered 12/2, 2010 at 18:11 Comment(4)

Although it could be rephrased more specifically, this isn't a horrible question. I really wish people had to explain themselves better than "Not a question" when voting to close. – Donets 12/2, 2010 at 18:24

I didn't vote to close, but I wished the OP had done a more objective job of detailing his gripes with the current GC. Also, "Java" is a language whereas he is speaking of an implementation, and I don't know what "implementing G1 in production" means, especially with the future tense of the rest of the question. If it is going to be in Java 7, surely no-one has used it in production? – Priapism 12/2, 2010 at 18:28

@Pascal G1 has been an experimental feature available in the JDK since JDK 6 update 14. By "implementing G1 in production" I think he meant actually using it, is not that hard to figure. And while I agree that G1 is part of JDK 7, not Java, a search for Java 7 on Google returns the JDK 7 homepage as it's first result, and both terms are often used interchangeably. @Benju I wouldn't trust results obtained with G1 on the current JDK as it is experimental, many things could change from now to the official release. – Mutual 12/2, 2010 at 19:20

It seems JDK 7 including update 1,2 and 3 does not use the G1 gc by default. You can chect it by jinfo -flag UseG1GC pid – Humboldt 19/3, 2012 at 10:17

D

34

It sounds like the point of G1 is to have smaller pause times, even to the point where it has the ability to specify a maximum pause time target.

Garbage collection isn't just a simple "Hey, it's full, let's move everything at once and start over" deal any more--it's fantastically complex, multi-level, background threaded system. It can do much of its maintenance in the background with no pauses at all, and it also uses knowledge of the system's expected patterns at runtime to help--like assuming most objects die right after being created, etc.

I would say GC pause times are going to continue to improve, not worsen, with future releases.

EDIT:

in re-reading it occurred to me that I use Java daily--Eclipse, Azureus, and the apps I develop, and it's been a LONG TIME since I saw a pause. Not a significant pause, but I mean any pause at all.

I've seen pauses when I right-click on windows explorer or (occasionally) when I hook up certain USB hardware, but with Java---none at all.

Is GC still an issue with anyone?

Donets answered 12/2, 2010 at 18:22 Comment(12)

Agree - the only time I've seen GC pauses is when I've either deliberately or accidentally provoked them with massively parallel garbage-creating code..... – Funky 18/7, 2010 at 10:21

Yes, GC is still very much an issue when you start dealing with large heaps (>16GB), especially with large tenured generations. – Harpoon 26/8, 2010 at 13:29

@the-alchemist wow, I've seen your comment in passing a few times and it just struck me that you said 16 GB!! Although I'm absolutely sure you're correct that this can cause huge delays, I want to check that you disabled ALL swapping. On a large memory system, any swapping of java will absolutely kill your system (Because GC is very swap-unfriendly). I'm sure you've already done this, but I just wanted to mention it--because it would make such a huge difference. I've never seen a PC with that much ram--how much do you have? 32g? – Donets 27/8, 2010 at 17:25

Yes, GCs are problematic for services in that they are what makes it VERY difficult to improve TP99.9 (and higher) limits. Specifically "old generation" GCs can be death traps that all but freeze JVM (and service) for multiple seconds; and for services that typically serve requests in single-digit (or low double-digit) milli seconds this is problematic. For what it's worth this was a practical problem with backend storage used by Amazon's Simple Queue service (can't go into tons of details as it's AWS internal). – Thremmatology 15/12, 2010 at 20:12

The annoying thing about GC is that Azul invented years ago an ingenious GC algorithm (Azul C4) that can easily cope with hundreds of gigabytes without any noticeable pause times by making very clever use of the processors memory hardware. But nobody knows this and it won't be implemented in the major Java versions soon since it needs some support by the operating system. And the operating system vendors won't do anything until people know about the algorithm and put pressure on the operating system vendors. See azulsystems.com/zing/pgc , managedruntime.org – Laure 21/10, 2011 at 7:1

More about garbage collection and Azul C4: youtube.com/watch?v=we_enrM7TSY – Libby 12/11, 2014 at 9:28

@Dr.Hans-PeterStörr I'm assuming you're talking about hardware virtual memory. Actually, OSes already provide interfaces for managing it (mmap and VirtualAlloc), and an application can even use virtualization hardware (ie, nested page tables) to manage it directly. But why JVM devs aren't even thinking of using it is beyond me. Maybe patents? – Bartels 19/10, 2017 at 13:57

@AleksandrDubinsky Wakend an old thread here. By the way now I'm running on a 64gb machine with a combination of apps that run around 32gb total (and then on top of that Eclipse and a few others). It's still true that the only time I see pauses is when the SYSTEM starts pushing 90% full. – Donets 19/10, 2017 at 17:14

@AleksandrDubinsky At least in 2010, Azul C4 needed additional kernel functionality. They wanted to contribute that to the Linux kernel, but gave up. Allegedly, the disapproval of Java (and the like) in the Linux kernel community was a factor. At least Gil Tene (Azul CTO) said: "the very idea of enabling something that could make GC better for everyone, on all runtimes, seemed to get people upset and angry. To some people, it seems, GC is only useful for helping lazy people who are too stupid to program without it, and anything that makes it work better should not be encouraged.". – Changeful 30/12, 2017 at 17:43

@Changeful Is the patch public? (It should be in the mailing list if it was proposed.) I can see the kernel devs hating Java, but I also don't see Azul open sourcing their garbage collector. Patches that only work for some proprietary piece of software are typically maintained by its vendor. A patch like this needs to be made available to and gestate in the community, and generate uptake. I hope the patch is public so I could see what it does. – Bartels 5/1, 2018 at 10:23

@AleksandrDubinsky Yes, the code that was part of the proposal is public: github.com/GregBowyer/ManagedRuntimeInitiative. C4 is not open source, but at least the rhetoric on Azul's side is that the new kernel functionality can be still useful for other garbage collectors. – Changeful 5/1, 2018 at 12:6

@Changeful As I suspected, the patch concerns manipulating virtual memory. I wonder why not use virtualization features to do the same thing in userspace. I studied this some more, and I think these page table tricks only optimize compaction (and only of large objects). It doesn't make or break pause-less GC. Pause-less GC revolves around very careful concurrent programming (to allow all GC phases to happen concurrently with program execution). Shenandoah is the OpenJDK attempt at making such a GC. – Bartels 7/1, 2018 at 16:35

P

58

I've been testing it out with a heavy application: 60-70GB allocated to heap, with 20-50GB in use at any time. With these sorts of applications, it's an understatement to say that your mileage may vary. I'm running JDK 1.6_22 on Linux. The minor versions are important-- before about 1.6_20, there were bugs in G1 that caused random NullPointerExceptions.

I've found that it is very good at keeping within the pause target you give it most of the time. The default appears to be a 100ms (0.1 second) pause, and I've been telling it to do half that (-XX:MaxGCPauseMillis=50). However, once it gets really low on memory, it panics and does a full stop-the-world garbage collection. With 65GB, that takes between 30 seconds and 2 minutes. (The number of CPUs probably doesn't make a difference; it's probably limited by the bus speed.)

Compared with CMS (which is not the default server GC, but it should be for web servers and other real-time applications), typical pauses are much more predictable and can be made much shorter. So far I'm having better luck with CMS for the huge pauses, but that may be random; I'm seeing them only a few times every 24 hours. I'm not sure which one will be more appropriate in my production environment at the moment, but probably G1. If Oracle keeps tuning it, I suspect G1 will ultimately be the clear winner.

If you're not having a problem with the existing garbage collectors, there's no reason to consider G1 right now. If you are running a low-latency application, such as a GUI application, G1 is probably the right choice, with MaxGCPauseMillis set really low. If you're running a batch-mode application, G1 doesn't buy you anything.

Pool answered 18/10, 2010 at 21:14 Comment(0)