Disclaimer: I am by far no GC expert, but lately getting into these details for fun.
As I said in the comments, you are using a collector that is deprecated, no one supports it and no one wants to use it, switch to G1
or even better IMHO switch to Shenandoah
: start from this simple thing first.
I can only assume that you increased ParGCCardsPerStrideChunk
from its default value and that probably helped by a few ms
(though we have no proof of that). We also have no logs from GC, CPU activity, logs, etc; thus this is pretty complicated to answer.
If indeed you have a big heap (tens of GB) and a big young space and you have enough GC Threads, setting that parameter to a bigger value might help indeed and it might even have to do with card table
that you are mentioning. Read further why.
CMS
splits the heap into old space
and young space
, it could have chosen any other discriminator, but they chose age
(just like G1
). Why is that needed? To be able to scan and collect only partial regions of the heap (scanning it entirely is very expensive). young space
is collected with a stop-the-world
pause, so it better be small, otherwise you will not be happy; that is why also why you usually will see many more young collections
compare to old ones
.
The only problem when you scan young space
is: what happens if there are references from old space
to objects from young space
? Collecting those is obviously wrong, but scanning the entire old space
to find out that answer would defeat the purpose of generational collections
entirely. Thus: card table
.
This keeps track of reference from old space
to young space
references, so it knows what exactly is garbage or not. G1
uses a card table
too, but also adds a RememberedSet
(not going into the details here). In practice, RememberedSets
turned out to be HUGE, that is why G1
became generational. (FYI: Shenandoah
uses matrix
instead of card table
- making it not generational).
So this huge intro, was to show that indeed increasing ParGCCardsPerStrideChunk
might have helped. You are giving each GC thread more space to work on. The default value is 256
and card table is 512 bytes
, that means
256 * 512 = 128KB per stride of old generation
If you for example have a heap of 32 GB
how many hundreds of thousands of strides is that? Probably too many.
Now, why you also bring reference counting
into the discussion here? I have no idea.
The examples that you have shown have different semantics and as such are kind of difficult to reason about; I'll still try to, though. You have to understand that reachability of Objects is just a graph that starts from some roots (called GC roots
). Let's take this example first:
public void b(){
new ShortLivedObject().doSomething(new Object()); // actually now is shortlived
}
ShortLivedObject
instance is "forgotten" as soon as doSomething
method invocation is done and its scope is within the method only, as such no one can reach it. Thus the remaining part is about the parameter of doSomething
: new Object
. If doSomething
does not do anything fishy with the parameter it got (making it reachable via a GC root
graph), then after doSomething
is done, it would become eligible for GC too. But even if doSomething
makes new Object
reachable it still means that ShortLivedObject
instance is eligible for GC.
As such, even if Example
is reachable (means it can't be collected), ShortLivedObject
and new Object()
can potentially be collected. It can look like this:
new Object()
|
\ /
ShortLivedObject
|
\ /
GC Root -> ... - > Example
You can see that once GC
will scan Example
instance, it might not scan ShortLivedObject
at all (that is why garbage is identified as the opposite of live objects). So a GC algorithm will simply discard the entire graph and not scan it at all.
The second example is different:
public void a(){
var shortLived = new ShortLivedObject(longLived);
shortLived.doSomething();
}
The difference is that longLived
here is an instance field and, as such, the graph will look a bit different:
ShortLivedObject
|
\ /
longLived
/ \
|
GC Root -> ... - > Example
It's obvious that ShortLivedObject
can be collected in this case, but not longLived
.
What you have to understand that this does not matter at all, if Example
instance can be collected; this graph will not be traversed and everything that Example
uses can be collected.
You should be able to understand now that using method a
can retain a bit more garbage and can potentially move it to old space
(when they become old enough) and can potentially make your young pauses
be longer and indeed increasing ParGCCardsPerStrideChunk
might help a bit; but this is highly speculative and you would need a pretty bad same pattern of allocations to happen for all of this to happen. Without logs, I highly doubt that.
CMS
. You need to switch toG1
(or even better Shenandoah) and see what happens there. The second problem is that I doubt you know for a fact thatLongLivedObject
is actually a long lived object - is it referenced by a GC root? Third is that you are confusing terminology a lot:CMS
has a STW pause for the young generation and two short pauses in the old generation and a lot of other things that you confuse. – Eyelidb()
is more performant that usinga()
. Perhaps im not clear and i have to improve the question – Harbingera
andb
are truly viable alternatives in your application, it's an indicator thatLongLivedObject
is entirely obsolete. It's not holding state that deserves to be held and apparently can be reconstructed from nothing without any impact. – Navada