What is remembered set in G1 algorithms used for?
Asked Answered
I

1

5

I just read some blogs about G1 algorithm.

The usage of remembered-set is confused to me.

Here is what I think:

Since we can use DFS to walk through every reference from GC-Roots, why do we need remembered-set?

Cause all the blogs to say the reason why we use remembered-set is we don't need to check every region to see if there is an object that is referenced by GC-Roots

Italicize answered 21/5, 2020 at 14:19 Comment(4)
"Since in generational GC we may need to find all the live objects in a young generation without traversing the older generation..." gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/storage/gc/…Childers
Does that mean every time g1 want to recycle a region, it need to check his remembered-set and every pointer in remembered-set's remembered-set, and if end with a gc-roots, that mean it is not recyclableItalicize
@Childers you are correct, but remembered sets were "born" before G1 became generational, as a matter of fact the size of these sets triggered G1 to become generational.Flyte
@Italicize you have it backwards. References are not traversed to the GC roots. References are pointing into the other direction and hence, you can only traverse them from the roots to the objects. Otherwise, garbage collection wouldn't be such an expensive operation.Drewdrewett
F
12

You need to understand what Card Table is first, IMO. How do you scan only young generation region and clean it, if there are references from old generation back to young? You need to "track" exactly where these connections are present - so while scanning young generation you could clean it without breaking the heap.

Think about it: you can't mark for removal an Object A that it is in young generation now, if there is a reference B to it, from old generation. But remember that right now - you are in the young collection only. So to track these "connections" a Card Table is implemented. Each bit from this card table says that a certain portion of the old generation is "dirty", meaning also scan that portion from the old generation while scanning young.

Why do you need that? The entire point of scanning young is to scan a little piece of the heap, not all. This card table achieves that.

G1 has regions. What if you are scanning regionA and you see that it has pointers to some other regionB? Simply putting this information in the Card Table is not enough. Your card table will only know about regionA, and next time you scan regionB - how do you know you are supposed to scan regionA also? If you don't do that, obviously the heap integrity is broken.

As such : remembered sets. These sets are populated by an asynchronous thread: it scans the card table and according to that information it also scans where these "dirty" regions have pointers to. It keeps track of that regionA -> regionB connection. Each region has it's own remembered set.

So when you reach the point that GC needs to happen, when scanning regionB you also look at it's remembered set and find out that you also need to scan regionA.


In practice, this is why G1 became generational : these remembered sets turned out to be huge. If you divide the heap in young and old, there is no need to keep the connections between young generations, you scan them all at once anyway, thus taking away the burned on the size of these sets. G1 wants to keep that 200ms (default) promise - to do that, you need to scan young generation all at once (because there is no connection between regions in remembered sets and otherwise heap integrity is gone), but at the same time if you make young generation small - the size of remembered sets will be big.

As such, touching these settings is an engineering miracle, IMHO.

Flyte answered 21/5, 2020 at 20:51 Comment(2)
Can i think this way: remembered set is more exact version of card table , since card table only know which region and remembered set knows exact addressItalicize
@Italicize to be frank, no. These are not "versions" of each other.G1 has them bothFlyte

© 2022 - 2024 — McMap. All rights reserved.