Understanding garbage collection in .NET
Asked Answered
F

2

202

Consider the below code:

public class Class1
{
    public static int c;
    ~Class1()
    {
        c++;
    }
}

public class Class2
{
    public static void Main()
    {
        {
            var c1=new Class1();
            //c1=null; // If this line is not commented out, at the Console.WriteLine call, it prints 1.
        }
        GC.Collect();
        GC.WaitForPendingFinalizers();
        Console.WriteLine(Class1.c); // prints 0
        Console.Read();
    }
}

Now, even though the variable c1 in the main method is out of scope and not referenced further by any other object when GC.Collect() is called, why is it not finalized there?

Foolproof answered 16/6, 2013 at 5:9 Comment(3)
The GC does not immediately free instances when they're out of scope. It does so when it thinks it's necessary. You can read everything about the GC here: msdn.microsoft.com/en-US/library/vstudio/0xy59wtx.aspxFullbodied
@Fullbodied (Pssst. Your link is broken.)Britneybritni
Some articles: GC | GC | GC | GC | GC | GCDavid
E
399

You are being tripped up here and drawing very wrong conclusions because you are using a debugger. You'll need to run your code the way it runs on your user's machine. Switch to the Release build first with Build + Configuration manager, change the "Active solution configuration" combo in the upper left corner to "Release". Next, go into Tools + Options, Debugging, General and untick the "Suppress JIT optimization" option.

Now run your program again and tinker with the source code. Note how the extra braces have no effect at all. And note how setting the variable to null makes no difference at all. It will always print "1". It now works the way you hope and expected it would work.

Which does leave with the task of explaining why it works so differently when you run the Debug build. That requires explaining how the garbage collector discovers local variables and how that's affected by having a debugger present.

First off, the jitter performs two important duties when it compiles the IL for a method into machine code. The first one is very visible in the debugger, you can see the machine code with the Debug + Windows + Disassembly window. The second duty is however completely invisible. It also generates a table that describes how the local variables inside the method body are used. That table has an entry for each method argument and local variable with two addresses. The address where the variable will first store an object reference. And the address of the machine code instruction where that variable is no longer used. Also whether that variable is stored on the stack frame or a cpu register.

This table is essential to the garbage collector, it needs to know where to look for object references when it performs a collection. Pretty easy to do when the reference is part of an object on the GC heap. Definitely not easy to do when the object reference is stored in a CPU register. The table says where to look.

The "no longer used" address in the table is very important. It makes the garbage collector very efficient. It can collect an object reference, even if it is used inside a method and that method hasn't finished executing yet. Which is very common, your Main() method for example will only ever stop executing just before your program terminates. Clearly you would not want any object references used inside that Main() method to live for the duration of the program, that would amount to a leak. The jitter can use the table to discover that such a local variable is no longer useful, depending on how far the program has progressed inside that Main() method before it made a call.

An almost magic method that is related to that table is GC.KeepAlive(). It is a very special method, it doesn't generate any code at all. Its only duty is to modify that table. It extends the lifetime of the local variable, preventing the reference it stores from getting garbage collected. The only time you need to use it is to stop the GC from being to over-eager with collecting a reference, that can happen in interop scenarios where a reference is passed to unmanaged code. The garbage collector cannot see such references being used by such code since it wasn't compiled by the jitter so doesn't have the table that says where to look for the reference. Passing a delegate object to an unmanaged function like EnumWindows() is the boilerplate example of when you need to use GC.KeepAlive().

So, as you can tell from your sample snippet after running it in the Release build, local variables can get collected early, before the method finished executing. Even more powerfully, an object can get collected while one of its methods runs if that method no longer refers to this. There is a problem with that, it is very awkward to debug such a method. Since you may well put the variable in the Watch window or inspect it. And it would disappear while you are debugging if a GC occurs. That would be very unpleasant, so the jitter is aware of there being a debugger attached. It then modifies the table and alters the "last used" address. And changes it from its normal value to the address of the last instruction in the method. Which keeps the variable alive as long as the method hasn't returned. Which allows you to keep watching it until the method returns.

This now also explains what you saw earlier and why you asked the question. It prints "0" because the GC.Collect call cannot collect the reference. The table says that the variable is in use past the GC.Collect() call, all the way up to the end of the method. Forced to say so by having the debugger attached and by running the Debug build.

Setting the variable to null does have an effect now because the GC will inspect the variable and will no longer see a reference. But make sure you don't fall in the trap that many C# programmers have fallen into, actually writing that code was pointless. It makes no difference whatsoever whether or not that statement is present when you run the code in the Release build. In fact, the jitter optimizer will remove that statement since it has no effect whatsoever. So be sure to not write code like that, even though it seemed to have an effect.


One final note about this topic, this is what gets programmers in trouble that write small programs to do something with an Office app. The debugger usually gets them on the Wrong Path, they want the Office program to exit on demand. The appropriate way to do that is by calling GC.Collect(). But they'll discover that it doesn't work when they debug their app, leading them into never-never land by calling Marshal.ReleaseComObject(). Manual memory management, it rarely works properly because they'll easily overlook an invisible interface reference. GC.Collect() actually works, just not when you debug the app.

Ethbinium answered 16/6, 2013 at 8:16 Comment(5)
See also my question that Hans answered nicely for me. #15561525Pentapody
@HansPassant I just found this awesome explanation, which also answers part of my question here: #30529879 about GC and thread synchronization. One question that I still have: I'm wondering if the GC actually compacts & updates addresses that are used in a register (stored in memory while suspended), or just skips them? A process that's updating registers after suspending the thread (before the resume) feels to me like a serious security thread that's blocked by the OS.Atomicity
Indirectly, yes. The thread is suspended, the GC updates the backing store for the CPU registers. Once the thread resumes running, it now uses the updated register values.Ethbinium
@HansPassant, I would appreciate if you add references for some of the non-obvious details of CLR garbage collector that you described here?Fenestration
It seems that configuration wise, an important point is that "Optimize code" (<Optimize>true</Optimize> in .csproj) is enabled. This is the default in the "Release" configuration. But in case one uses custom configurations, it is relevant to know that this setting is important.Unexceptional
C
44

[ Just wanted to add further on the Internals of Finalization process ]

You create an object and when the object is garbage collected, the object's Finalize method should be called. But there is more to finalization than this very simple assumption.

CONCEPTS:

  1. Objects not implementing Finalize methods: their memory is reclaimed immediately, unless of course, they are not reachable by application code any more.

  2. Objects implementing Finalize method: the concepts of Application Roots, Finalization Queue, Freachable Queue need to be understood since they are involved in the reclamation process.

  3. Any object is considered garbage if it is not reachable by application code.

Assume: classes/objects A, B, D, G, H do not implement the Finalize method and C, E, F, I, J do implement the Finalize method.

When an application creates a new object, the new operator allocates memory from the heap. If the object's type contains a Finalize method, then a pointer to the object is placed on the finalization queue. Therefore pointers to objects C, E, F, I, J get added to the finalization queue.

The finalization queue is an internal data structure controlled by the garbage collector. Each entry in the queue points to an object that should have its Finalize method called before the object's memory can be reclaimed.

The figure below shows a heap containing several objects. Some of these objects are reachable from the application roots, and some are not. When objects C, E, F, I, and J are created, the .NET framework detects that these objects have Finalize methods and pointers to these objects are added to the finalization queue.

enter image description here

When a GC occurs (1st Collection), objects B, E, G, H, I, and J are determined to be garbage. A,C,D,F are still reachable by application code depicted as arrows from the yellow box above.

The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to the freachable queue ("F-reachable", i.e. finalizer reachable). The freachable queue is another internal data structure controlled by the garbage collector. Each pointer in the freachable queue identifies an object that is ready to have its Finalize method called.

After the 1st GC, the managed heap looks something similar to figure below. Explanation given below:

  1. The memory occupied by objects B, G, and H has been reclaimed immediately because these objects did not have a finalize method that needed to be called.

  2. However, the memory occupied by objects E, I, and J could not be reclaimed because their Finalize method has not been called yet. Calling the Finalize method is done by freachable queue.

  3. A, C, D, F are still reachable by application code depicted as arrows from yellow box above, so they will not be collected in any case.

enter image description here

There is a special runtime thread dedicated to calling Finalize methods. When the freachable queue is empty (which is usually the case), this thread sleeps. But when entries appear, this thread wakes, removes each entry from the queue, and calls each object's Finalize method. The garbage collector compacts the reclaimable memory and the special runtime thread empties the freachable queue, executing each object's Finalize method. So here finally is when your Finalize method gets executed.

The next time the garbage collector is invoked (2nd GC), it sees that the finalized objects are truly garbage, since the application's roots don't point to it and the freachable queue no longer points to it (it's EMPTY too), therefore the memory for the objects E, I, J may be reclaimed from the heap. See figure below and compare it with figure just above.

enter image description here

The important thing to understand here is that two GCs are required to reclaim memory used by objects that require finalization. In reality, more than two collections cab be even required since these objects may get promoted to an older generation.

NOTE: The freachable queue is considered to be a root just like global and static variables are roots. Therefore, if an object is on the freachable queue, then the object is reachable and is not garbage.

As a last note, remember that debugging application is one thing, garbage collection is another thing and works differently. So far you can't feel garbage collection just by debugging applications. If you wish to further investigate memory get started here.

Crucify answered 10/4, 2015 at 10:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.