First off, we need to be sure we're on the same page as to what a tracing garbage collection algorithm does in its mark phase.
At any given moment, a tracing GC has a number of objects that are known to be alive, in the sense that they are reachable by the running program as it stands right now. The main step of mark phrase involves following the non-static fields of those objects to find more objects, and those new objects will now also be known to be alive. This step is repeated recursively until no new alive objects are found by traversing the existing live objects. All objects in memory not proved live are considered dead. (The GC then moves to the next phase, which is called the sweep phase. We don't care about that phase for this answer.)
Now this alone is not enough to execute the algorithm. In the beginning, the algorithm has no objects that it knows to be alive, so it can't start following anyone's non-static fields. We need to specify a set of objects that are considered known to be alive from the start. We choose those objects axiomatically, in the sense that they don't come from a previous step of the algorithm -- they come from outside. Specifically, they come from the semantics of the language. Those objects are called roots.
In a language like Java, there are two sets of objects that are definite GC roots. Anything that is accessible by a local variable that's still in scope is obviously reachable (within its method, which still hasn't returned), therefore it's alive, therefore it's a root. Anything that is accessible through a static field of a class is also obviously reachable (from anywhere), therefore it's alive, therefore it's a root.
But if non-static fields were considered roots as well, what would happen?
Say you instantiate an ArrayList<E>
. Inside, that object has a non-static field that points to an Object[]
(the backing array that represents the storage of the list). At some point, a GC cycle starts. In the mark phase, the Object[]
is marked as alive because it is pointed to by the ArrayList<E>
private non-static field. The ArrayList<E>
is not pointed to by anything, so it fails to be considered alive. Thus, in this cycle, the ArrayList<E>
is destroyed while the backing Object[]
survives. Of course, at the next cycle, the Object[]
also dies, because it is not reachable by any root. But why do this in two cycles? If the ArrayList<E>
was dead in the first cycle and if Object[]
is used only by a dead object, shouldn't the Object[]
also be considered dead in the same move, to save time and space?
That's the point here. If we want to be maximally efficient (in the context of a tracing GC), we need to get rid of as many dead objects as possible in a single GC.
To do that, a non-static field should keep an object alive only if the enclosing object (the object that contains the field) has been proved to be alive. By contrast, roots are objects we call alive axiomatically (without proof) in order to kick-start the algorithm's marking phase. It is in our best interest to limit the latter category to the bare minimum that doesn't break the running program.
For example, say you have this code:
class Foo {
Bar bar = new Bar();
public static void main(String[] args) {
Foo foo = new Foo();
System.gc();
}
public void test() {
Integer a = 1;
bar.counter++; //access to the non-static field
}
}
class Bar {
int counter = 0;
}
- When the garbage collection starts, we get one root that's the local variable
Foo foo
. That's it, that's our only root.
- We follow the root to find the instance of
Foo
, which is marked as alive and then we attempt to find its non-static fields. We find one of them, the Bar bar
field.
- We follow the fields to find the instance of
Bar
, which is marked as alive and then we attempt to find its non-static fields. We find that it contains no more fields that are reference types, so the GC doesn't need to bother for that object anymore.
- Since we can't get find new alive objects in this round of recursion, the mark phase can end.
Alternatively:
class Foo {
Bar bar = new Bar();
public static void main(String[] args) {
Foo foo = new Foo();
foo.test();
}
public void test() {
Integer a = 1;
bar.counter++; //access to the non-static field
System.gc();
}
}
class Bar {
int counter = 0;
}
- When the garbage collection starts, the local variable
Integer a
is a root and the Foo this
reference (the implicit reference that all non-static methods get) is also a root. The local variable Foo foo
from main
is also a root because main
hasn't gone out of scope yet.
- We follow the root to find the instance of
Integer
and instance of Foo
(we find one of these objects twice, but this doesn't matter for the algorithm), which are marked as alive and then we attempt to follow their non-static fields. Let's say the instance of Integer
has no more fields to class instances. The instance of Foo
gives us one Bar
field.
- We follow the field to find the instance of
Bar
, which is marked as alive and then we attempt to find its non-static fields. We find that it contains no more fields that are reference types, so the GC doesn't need to bother for that object anymore.
- Since we can't get find new alive objects in this round of recursion, the mark phase can end.