How does Java Garbage Collection work with Circular References?
Asked Answered
M

9

183

From my understanding, garbage collection in Java cleans up some objects if nothing else is 'pointing' to that object.

My question is, what happens if we have something like this:

class Node {
    public object value;
    public Node next;
    public Node(object o, Node n) { value = 0; next = n;}
}

//...some code
{
    Node a = new Node("a", null), 
         b = new Node("b", a), 
         c = new Node("c", b);
    a.next = c;
} //end of scope
//...other code

a, b, and c should be garbage collected, but they are all being referenced by other objects.

How does the Java garbage collection deal with this? (or is it simply a memory drain?)

Metapsychology answered 15/12, 2009 at 20:33 Comment(1)
See: #408355, specifically the second answer from @gnud.Factoring
A
182

Java's GC considers objects "garbage" if they aren't reachable through a chain starting at a garbage collection root, so these objects will be collected. Even though objects may point to each other to form a cycle, they're still garbage if they're cut off from the root.

See the section on unreachable objects in Appendix A: The Truth About Garbage Collection in Java Platform Performance: Strategies and Tactics for the gory details.

Aircrew answered 15/12, 2009 at 20:35 Comment(10)
I figured as much :). I'm more curious (intellectually) how exactly that happens. Does the GC generate some sort of closure from the objects?Metapsychology
Do you have a reference for that? It's hard to test it.Laval
I added a reference. You can also override an object's finalize() method to find out when it gets collected (although that's about the only thing I'd recommend using finalize() for).Aircrew
Just to clarify that last comment... put a debug print statement in the finalize method that prints out a unique id for the object. You'll be able to see all the objects that reference each other get collected.Aircrew
"...smart enough to recognize..." sounds confusing. GC don't have to recognize cycles - they are just unreachable, hence garbageFouquiertinville
@Alexander: You're right. The way I phrased it originally made it sound like the GC was analyzing all objects in memory and explicitly tagging cycles. It's not that complicated, so I rephrased my answer.Aircrew
also, here is direct link to section which talks about cycles (btw, thanks for linking, didn't know about that book)Fouquiertinville
@Laval "Do you have a reference for that?" in a discussion about garbage collection. Best. Pun. Ever.Piranesi
@Holger Thanks for the update. It's a shame that book is no longer available for free online. :(Aircrew
Note that there are other good texbooks about Garbage Collection. While they may not specifically cover current generation Java collectors, they cover the techniques used that Java uses ... at least at a high level.Ascendant
M
159

yes Java Garbage collector handles circular-reference!

How?

There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.

A simple Java application has the following GC roots:

  1. Local variables in the main method
  2. The main thread
  3. Static variables of the main class

enter image description here

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows

  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.

So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.

Ofcourse sometimes this may led to memory leak if programmer forgets to dereference an object.

enter image description here

Source : Java Memory Management

Md answered 21/8, 2013 at 6:57 Comment(3)
Thanks for linking that book. It is full of great info about this and other Java development topics!Armes
In the last picture, there's a non reachable object but its in reachable objects section.Yapon
There are still three objects that are unreachable in the "reachable section" of the last diagram.Motherofpearl
T
17

You are correct. The specific form of garbage collection you describe is called "reference counting". The way it works (conceptually, at least, most modern implementations of reference counting are actually implemented quite differently) in the simplest case, looks like this:

  • whenever a reference to an object is added (e.g. it is assigned to a variable or a field, passed to method, and so on), its reference count is increased by 1
  • whenever a reference to an object is removed (the method returns, the variable goes out of scope, the field is re-assigned to a different object or the object which contains the field gets itself garbage collected), the reference count is decreased by 1
  • as soon as the reference count hits 0, there is no more reference to the object, which means nobody can use it anymore, therefore it is garbage and can be collected

And this simple strategy has exactly the problem you decribe: if A references B and B references A, then both of their reference counts can never be less than 1, which means they will never get collected.

There are four ways to deal with this problem:

  1. Ignore it. If you have enough memory, your cycles are small and infrequent and your runtime is short, maybe you can get away with simply not collecting cycles. Think of a shell script interpreter: shell scripts typically only run for a few seconds and don't allocate much memory.
  2. Combine your reference counting garbage collector with another garbage collector which doesn't have problems with cycles. CPython does this, for example: the main garbage collector in CPython is a reference counting collector, but from time to time a tracing garbage collector is run to collect the cycles.
  3. Detect the cycles. Unfortunately, detecting cycles in a graph is a rather expensive operation. In particular, it requires pretty much the same overhead that a tracing collector would, so you could just as well use one of those.
  4. Don't implement the algorithm in the naive way you and I would: since the 1970s, there have been multiple quite interesting algorithms developed that combine cycle detection and reference counting in a single operation in a clever way that is significantly cheaper than either doing them both seperately or doing a tracing collector.

By the way, the other major way to implement a garbage collector (and I have already hinted at that a couple of times above), is tracing. A tracing collector is based on the concept of reachability. You start out with some root set that you know is always reachable (global constants, for example, or the Object class, the current lexical scope, the current stack frame) and from there you trace all objects that are reachable from the root set, then all objects that are reachable from the objects reachable from the root set and so on, until you have the transitive closure. Everything that is not in that closure is garbage.

Since a cycle is only reachable within itself, but not reachable from the root set, it will be collected.

Tearjerker answered 16/12, 2009 at 0:4 Comment(2)
Since question is Java-specific, I think it's worth mentioning that Java doesn't use ref counting and hence problem non-existent. Also link to wikipedia would be helpful as "further reading". Otherwise great overview!Fouquiertinville
I've just read your comments to Jerry Coffin's post, so now I'm not that sure :)Fouquiertinville
O
14

A garbage collector starts from some "root" set of places that are always considered "reachable", such as the CPU registers, stack, and global variables. It works by finding any pointers in those areas, and recursively finding everything they point at. Once it's found all that, everything else is garbage.

There are, of course, quite a few variations, mostly for the sake of speed. For example, most modern garbage collectors are "generational", meaning that they divide objects into generations, and as an object gets older, the garbage collector goes longer and longer between times that it tries to figure out whether that object is still valid or not -- it just starts to assume that if it has lived a long time, chances are pretty good that it'll continue to live even longer.

Nonetheless, the basic idea remains the same: it's all based on starting from some root set of things that it takes for granted could still be used, and then chasing all the pointers to find what else could be in use.

Interesting aside: may people are often surprised by the degree of similarity between this part of a garbage collector and code for marshaling objects for things like remote procedure calls. In each case, you're starting from some root set of objects, and chasing pointers to find all the other objects those refer to...

Oilcloth answered 15/12, 2009 at 20:51 Comment(5)
What you are describing is a tracing collector. There are other kinds of collectors. Of particular interest for this discussion are reference counting collectors, which do tend to have trouble with cycles.Tributary
@Jörg W Mittag: Certainly true -- though I don't know of a (reasonably current) JVM that uses reference counting, so it seems unlikely (at least to me) that it makes much difference to the original question.Oilcloth
@Jörg W Mittag:At least by default I believe Jikes RVM currently uses the Immix collector, which is a region-based tracing collector (though it does also use reference counting). I'm not sure whether you're referring to that reference counting, or another collector that uses reference counting without tracing (I'd guess the latter, since I've never heard of Immix being calling "recycler").Oilcloth
I got mixed up a bit: the Recycler is (was?) implemented in Jalapeno, the algorithm I was thinking about, which is (was?) implemented in Jikes is Ulterior Reference Counting. Atlhough, of course, saying that Jikes uses this or that garbage collector is quite futile, given that Jikes and especially MMtk are specifically designed to rapidly develop and test different garbage collectors within the same JVM.Tributary
Ulterior Reference Counting was designed in 2003 by the same people who designed Immix in 2007, so I guess that the latter probably superseded the former. URC was specifically designed so that it can be combined with other strategies, and in fact the URC paper explicitly mentions that URC is only a stepping stone towards a collector that combines the advantages of tracing and reference counting. I guess Immix is that collector. Anyway, the Recycler is a pure reference counting collector, which nonetheless can detect and collect cycles: WWW.Research.IBM.Com/people/d/dfb/recycler.htmlTributary
P
9

The Java GCs don't actually behave as you describe. It's more accurate to say that they start from a base set of objects, frequently called "GC roots", and will collect any object that can not be reached from a root.
GC roots include things like:

  • static variables
  • local variables (including all applicable 'this' references) currently in the stack of a running thread

So, in your case, once the local variables a, b, and c go out of scope at the end of your method, there are no more GC roots that contain, directly or indirectly, a reference to any of your three nodes, and they'll be eligible for garbage collection.

TofuBeer's link has more detail if you want it.

Pease answered 15/12, 2009 at 20:41 Comment(1)
"...currently in the stack of a running thread..." isn't it scanning stacks of all threads in order to not corrupt other thread's data ?Fouquiertinville
U
7

This article (no longer available) goes into depth about the garbage collector (conceptually... there are several implementations). The relevant part to your post is "A.3.4 Unreachable":

A.3.4 Unreachable An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object.

Unthinking answered 15/12, 2009 at 20:39 Comment(2)
direct link to that sectionFouquiertinville
the links is no longer availableMcatee
G
2

Garbage collection doesn't usually mean "clean some object iff nothing else is 'pointing' to that object" (that's reference counting). Garbage collection roughly means finding objects that can't be reached from the program.

So in your example, after a,b, and c go out of scope, they can be collected by the GC, since you can't access these objects anymore.

Gallnut answered 15/12, 2009 at 20:39 Comment(3)
"Garbage collection roughly means finding objects that can't be reached from the program". In most GC algorithms it is actually the other way around. You start with the GC roots and see what you can find, the rest is considered unreferenced garbage.Rollet
Reference counting is one of the two main implementation strategies for garbage collection. (The other is tracing.)Tributary
@Jörg: Most of the time today, when people talk about garbage collectors they are referring to collectors based on some kind of mark'n'sweep algorithm. Ref counting is typically what you are stuck with if you don't have a garbage collector. It is true that ref counting is in a sense a garbage collection strategy but hardly no gc existing today that is built on top of it so saying that it is a gc strategy is just going to confuse people because in practice it is no longer a gc strategy but an alternative way to manage memory.Rollet
U
1

Bill answered your question directly. As Amnon said, your definition of garbage collection is just reference counting. I just wanted to add that even very simple algorithms like mark and sweep and copy collection easily handle circular references. So, nothing magic about it!

Unwilled answered 15/12, 2009 at 20:43 Comment(0)
T
0

[JVM Memory model diagram]

GC starts marking by is used flag from roots(seeds) which are stack or permanent. All others references are not taking into account.

Retain cycle[About](Circular References) for iOS is a situation where you are(as a developer) is responsible for counting references and deallocating them

Thrower answered 25/1, 2021 at 20:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.