GC overhead of Optional<T> in Java
Asked Answered
M

1

13

We all know that every object allocated in Java adds a weight into future garbage collection cycles, and Optional<T> objects are no different. We use these objects frequently to wrap nullable, which leads to safer code, but at what cost?

Does anyone have information on what kind of additional GC pressure optional objects add vs. simply returning nulls and what kind of impact this has on performance in high-throughput systems?

Mendy answered 30/1, 2019 at 15:24 Comment(8)
Maybe: What is the memory consumption of an object in Java? can help. Optional are normal objects after allTibetan
What kind of answer do you expect other than "it depends"? It depends on so many things: How long-lived are your Optional objects? How frequently are they empty? How strong is the GC pressure at the moment? All of that stuff.Clowers
Optional.empty() is a singleton. So in practice it costs not more than null. Concerning Optional instances wrapping not null objects, it has a cost but it is really cheap. It is just a wrapper of the contained object. Its state contains only a reference on that. "Object in object" is very common in OOP. It should never be a issue for lightweight classes as Optional.Probable
I haven’t read such information anywhere (given the wide use of Optional this may in itself suggest that there isn’t any great problem). You may always conduct your own measurements, of course.Weatherman
Yes, it will depend on a lot of factors. To answer some questions, let's set a sample case: objects will be very short lived, null maybe 10% of the time, and GC pressure will be constant (a young generation collection every 10 seconds, collecting maybe 1 to 1.5 GB)Mendy
Possible duplicate of Performance of Java OptionalMendy
Useful link, @jocull, to a closely related question, thanks. However this questions is specifically about garbage collection, which is not mentioned in the other question, nor in its answers, so I don’t think it would be fair to call it an exact duplicate.Weatherman
@OleV.V. but GC costs are not specific to optional. They are the same for all similar small wrapper objects and also depend on use patterns (e.g. local use may be elided via EA)Degradation
T
23

We all know that every object allocated in Java adds a weight into future garbage collection cycles,…

That sounds like a statement nobody could deny, but let’s look at the actual work of a garbage collector, considering common implementations of modern JVMs and the impact of an allocated object on it, especially objects like Optional instances which are typically of a temporary nature.

The first task of the garbage collector is to identify objects which are still alive. The name “garbage collector” puts a focus on identifying garbage, but garbage is defined as unreachable objects and the only way to find out which objects are unreachable, is via the process of elimination. So the first task is solved by traversing and marking all reachable objects. So the costs of this process do not depend on the total amount of allocated objects, but only those, which are still reachable.

The second task is to make the memory of the garbage available to new allocations. Instead of puzzling with the memory gaps between still reachable objects, all modern garbage collectors work by evacuating a complete region, transferring all alive objects withing that memory to a new location and adapting the references to them. After the process, the memory is available to new allocations as a whole block. So this is again a process whose costs do not depend on the total amount of allocated objects, but only (a part of) the still alive objects.

Therefore, an object like a temporary Optional may impose no costs on the actual garbage collection process at all, if it is allocated and abandoned between two garbage collection cycles.

With one catch, of course. Each allocation will reduce the memory available to subsequent allocations until there’s no space left and the garbage collection has to take place. So we could say, each allocation reduces the time between two garbage collection runs by the size of the allocation space divided by the object size. Not only is this a rather tiny fraction, it also only applies to a single threaded scenario.

In implementations like the Hotspot JVM, each thread uses a thread local allocation buffer (TLAB) for new objects. Once its TLAB is full, it will fetch a new one from the allocation space (aka Eden space). If there is none available, a garbage collection will be triggered. Now it’s rather unlikely that all threads hit the end of their TLAB right at the same time. So for the other threads which still have some space in their TLAB left at this time, it would not make any difference if they had allocated some more objects still fitting in that remaining space.

The perhaps surprising conclusion is that not every allocated object has an impact on the garbage collection, i.e. a purely local object allocated by a thread not triggering the next gc, could be entirely free.

Of course, this does not apply to allocating a large amount of objects. Allocating lots of them causes the thread to allocate more TLABs and eventually trigger the garbage collection earlier than without. That’s why we have classes like IntStream allowing to process a large number of elements without allocating objects, as would happen with a Stream<Integer>, while there is no problem in providing the result as a single OptionalInt instance. As we know now, a single temporary object has only a tiny impact on the gc, if any.

This did not even touch the JVM’s optimizer, which may eliminate object allocations in hot spots, if Escape Analysis has proven that the object is purely local.

Trainbearer answered 10/2, 2019 at 17:35 Comment(8)
Does it mean that in some cases Escape Analysis can decide to allocate some object on the stack instead of the heap imposing no overhead to garbage collection? If so, is there an option that can help to control such stack allocation?Jersey
@Jersey “Escape Analysis” is just the process of identifying purely local objects. It enables subsequent optimizations, like Lock Elimination and Scalarization. The latter will eliminate the allocation completely, not just redirect it to the stack. Think of converting the field accesses of the local objects into local variables, followed by the usual optimizations, eliminating unused variables or folding variables holding constants or the same value as other variables (often, field are initialized with values from other variables in scope), then moving most used vars into CPU registers.Trainbearer
As an end result, some of the object’s fields may end up on the stack, but not following the object’s memory layout at all. As a simple example, new Rectangle(0, 0, a, b).area() may get optimized to a * b (assuming a typical Rectangle implementation), without allocating anything. Escape Analysis is on by default, so there’s not much to do, but its efficiency might get influenced by general settings like -XX:MaxInlineLevel, -XX:MaxInlineSize, and -XX:FreqInlineSize which influence the optimizer’s horizon and in turn, how long an object’s lifetime can be to still be “purely local”.Trainbearer
GC takes some time to run, and when stressing it with large amount of such temporary objects this can really affect performance, but in most cases not important in typical scenario, like web application. But there are other areas, like game development, where in languages with GC people often tend to avoid any allocation each frame, as GC running during frame might be visible and annoying to user. A great example of how bad this can be could be Minecraft game ;) Stressing GC with temp immutable objects so much that you can feel it.Tabathatabb
@Tabathatabb that's the difference between "some temporary objects", like a single Iterator of a loop or an Optional return type and "stressing it with large amount of such temporary objects", like when using a stream of boxed elements rather than primitive values. Further, there's a significant difference between "throughput" (overall performance) and "latency", two goals at different ends of the available options. You could get away with allocations in a game loop with the right tuning, but it's much easier to just avoid allocations.Trainbearer
In the example above of streams of boxed objects, will the optimizer ever improve those values to be contiguous?Mendy
For a stream of temporary objects, the optimizer can eliminate the allocation when the entire pipeline has been inlined. For the commonly used Hotspot JVM, the biggest obstacle is the -XX:MaxInlineLevel default of nine, which is quiet low for the Stream API. You can easily double that value to get better performance with code using streams. The other options named in this comment can be tuned too, but the inlining level is the biggest lever.Trainbearer
Wow @Trainbearer is the King of the Party. Thanks for such deep realizations on GC and optimizations parts.Carthage

© 2022 - 2024 — McMap. All rights reserved.