Is this closure combination behaviour a C# compiler bug?

Asked 24/11, 2015 at 13:10 Answered 26/11, 2015 at 15:51

I was investigating some strange object lifetime issues, and came across this very puzzling behaviour of the C# compiler:

Consider the following test class:

class Test
{
    delegate Stream CreateStream();

    CreateStream TestMethod( IEnumerable<string> data )
    {
        string file = "dummy.txt";
        var hashSet = new HashSet<string>();

        var count = data.Count( s => hashSet.Add( s ) );

        CreateStream createStream = () => File.OpenRead( file );

        return createStream;
    }
}

The compiler generates the following:

internal class Test
{
  public Test()
  {
    base..ctor();
  }

  private Test.CreateStream TestMethod(IEnumerable<string> data)
  {
    Test.<>c__DisplayClass1_0 cDisplayClass10 = new Test.<>c__DisplayClass1_0();
    cDisplayClass10.file = "dummy.txt";
    cDisplayClass10.hashSet = new HashSet<string>();
    Enumerable.Count<string>(data, new Func<string, bool>((object) cDisplayClass10, __methodptr(<TestMethod>b__0)));
    return new Test.CreateStream((object) cDisplayClass10, __methodptr(<TestMethod>b__1));
  }

  private delegate Stream CreateStream();

  [CompilerGenerated]
  private sealed class <>c__DisplayClass1_0
  {
    public HashSet<string> hashSet;
    public string file;

    public <>c__DisplayClass1_0()
    {
      base..ctor();
    }

    internal bool <TestMethod>b__0(string s)
    {
      return this.hashSet.Add(s);
    }

    internal Stream <TestMethod>b__1()
    {
      return (Stream) File.OpenRead(this.file);
    }
  }
}

The original class contains two lambdas: s => hashSet.Add( s ) and () => File.OpenRead( file ). The first closes over local variable hashSet, the second closes over local variable file. However, the compiler generates a single closure implementation class <>c__DisplayClass1_0 that contains both hashSet and file. As a consequence, the returned CreateStream delegate contains and keeps alive a reference to the hashSet object that should have been available for GC once TestMethod returned.

In the actual scenario where I've encountered this issue, a very substantial (ie, >100mb) object is incorrectly enclosed.

My specific questions are:

Is this a bug? If not, why is this behaviour considered desirable?

Update:

The C# 5 spec 7.15.5.1 says:

When an outer variable is referenced by an anonymous function, the outer variable is said to have been captured by the anonymous function. Ordinarily, the lifetime of a local variable is limited to execution of the block or statement with which it is associated (§5.1.7). However, the lifetime of a captured outer variable is extended at least until the delegate or expression tree created from the anonymous function becomes eligible for garbage collection.

This would appear to be open to some degree of interpretation, and does not explicitly prohibit a lambda from capturing variables that it does not reference. However, this question covers a related scenario, which @eric-lippert considered to be a bug. IMHO, I see the combined closure implementation provided by the compiler as a good optimization, but that the optimization should not be used for lambdas which the compiler can reasonably detect may have lifetime beyond the current stack frame.

How do I code against this without abandoning the use of lambdas all together? Notably how do I code against this defensively, so that future code changes don't suddenly cause some other unchanged lambda in the same method to start enclosing something that it shouldn't?

Update:

The code example I've provided is by necessity contrived. Clearly, refactoring lambda creation out to a separate method works around the problem. My question is not intended to be about design best practices (as well covered by @peter-duniho). Rather, given the content of the TestMethod as it stands, I'd like to know if there is any way to coerce the compiler to exclude the createStream lambda from the combined closure implementation.

For the record, I'm targeting .NET 4.6 with VS 2015.

Enoch answered 24/11, 2015 at 13:10 Comment(5)

They share the same lexical scope. maybe because of that. – Aggy 24/11, 2015 at 13:14

Possible duplicate of Discrete Anonymous methods sharing a class?. As an added bonus, this example is pretty simple, but was not contrived. – Corinecorinna 27/11, 2015 at 13:52

Is this the reason for "implicitly captured closure"? I think I understand that warning a lot better now. I always wondered why in some cases a lambda was capturing something it had nothing to do with. – Jefferey 27/11, 2015 at 16:52

"the optimization should not be used for lambdas which the compiler can reasonably detect may have lifetime beyond the current stack frame" -- What is that supposed to mean? By definition, all closures "have lifetime beyond the current stack frame". It's not clear to me that the behavior is a "good optimization" (surely it doesn't cost that much extra in the context of closure classes to just make a separate class for each independent lambda). But if it's an optimization at all, by what logic will a compiler forego it conditionally? – Elizabethelizabethan 27/11, 2015 at 18:22

"given the content of the TestMethod as it stands, I'd like to know if there is any way to coerce the compiler to exclude the createStream lambda from the combined closure implementation" -- you seem to be asking "can I change this code to avoid the problem, but do so without changing the code?". I already provided a clear example of how you would avoid the problem in the example you posted; a similar approach can be done in any such example. But any work-around will necessarily involve changing the method; how could it be otherwise? – Elizabethelizabethan 27/11, 2015 at 18:25

Is this a bug?

No. The compiler is compliant with the specification here.

Why is this behaviour considered desirable?

It's not desirable. It's deeply unfortunate, as you discovered here, and as I described back in 2007:

http://blogs.msdn.com/b/ericlippert/archive/2007/06/06/fyi-c-and-vb-closures-are-per-scope.aspx

The C# compiler team has considered fixing this in every version since C# 3.0 and it has never been high enough priority. Consider entering an issue on the Roslyn github site (if there isn't one already; there may well be).

I personally would like to see this fixed; as it stands it is a big "gotcha".

How do I code against this without abandoning the use of lambdas all together?

The variable is the thing that is captured. You could set the hashset variable to null when you're done with it. Then the only memory being consumed is the memory for the variable, four bytes, and not the memory for the thing it is referring to, which will be collected.

Giguere answered 26/11, 2015 at 15:51 Comment(0)

I'm not aware of anything in the C# language specification that would dictate exactly how a compiler is to implement anonymous methods and variable capturing. This is an implementation detail.

What the specification does do is set some rules for how the anonymous methods and their capture variables are required to behave. I don't have a copy of the C# 6 specification, but here is relevant text from the C# 5 specification, under "7.15.5.1 Captured outer variables":

…the lifetime of a captured outer variable is extended at least until the delegate or expression tree created from the anonymous function becomes eligible for garbage collection. [emphasis mine]

There is nothing in the specification that limits the lifetime of the variable. The compiler is simply required to make sure the variable lives long enough to remain valid if needed by the anonymous method.

So…

1.Is this a bug? If not, why is this behaviour considered desirable?

Not a bug. The compiler is complying with the specification.

As for whether it's considered "desirable", that's a loaded term. What's "desirable" depends on your priorities. That said, one priority of a compiler author is to simplify the task of the compiler (and in doing so, making it run faster and reduce the chances of bugs). This particular implementation might be considered "desirable" in that context.

On the other hand, language designers and compiler authors both also have a shared goal of helping programmers produce working code. Inasmuch as an implementation detail can interfere with this, such an implementation detail might be considered "undesirable". Ultimately, it's a matter of how each of those priorities are ranked, according to their potentially competing goals.

2.How do I code against this without abandoning the use of lambdas all together? Notably how do I code against this defensively, so that future code changes don't suddenly cause some other unchanged lambda in the same method to start enclosing something that it shouldn't?

Hard to say without a less-contrived example. In general, I'd say the obvious answer is "don't mix your lambdas like that". In your particular (admittedly contrived) example, you have one method that appears to be doing two completely different things. This is generally frowned upon for a variety of reasons, and it seems to me that this example just adds to that list.

I don't know what the best way to fix the "two different things" would be, but an obvious alternative would be to at least refactor the method so that the "two different things" method is delegating the work to another two methods, each named descriptively (which has the bonus benefit of helping the code to be self-documenting).

For example:

CreateStream TestMethod( IEnumerable<string> data )
{
    string file = "dummy.txt";
    var hashSet = new HashSet<string>();

    var count = AddAndCountNewItems(data, hashSet);

    CreateStream createStream = GetCreateStreamCallback(file);

    return createStream;
}

int AddAndCountNewItems(IEnumerable<string> data, HashSet<string> hashSet)
{
    return data.Count( s => hashSet.Add( s ) );
}

CreateStream GetCreateStreamCallback(string file)
{
    return () => File.OpenRead( file );
}

In this way, the captured variables remain independent. Even if the compiler does for some bizarre reason still put them both into the same closure type, it still should not result in the same instance of that type used between the two closures.

Your TestMethod() still does two different things, but at least it doesn't itself contain those two unrelated implementations. The code is more readable and better compartmentalized, which is a good thing even besides the fact that it fixes the variable lifetime issue.

Elizabethelizabethan answered 24/11, 2015 at 18:36 Comment(3)

regarding the C# spec, 7.15.5.1, the first paragraph starts "When an outer variable is referenced by an anonymous function, the outer variable is said to have been captured by the anonymous function". However, the lambda () => File.OpenRead( file ) does not reference the outer variable hashSet, so the lifetime of hashSet should not be extended by the lifetime of this lambda. Regarding the two different things - as you note, this is indeed a contrived example. The issue appears to affect any method which uses capturing lambdas do do some work and creates a long-lived capturing lambda. – Enoch 25/11, 2015 at 9:27

@tg73: "so the lifetime of hashSet should not be extended by the lifetime of this lambda" -- IMHO, you are not reading the specification carefully enough. The hashSet variable is captured by the other lambda expression, and nothing in the specification puts an upper limit on the lifetime of such captured variables. If the compiler had wanted to, it could implement the capturing by making the variable a static variable and never discarding it. While I understand the behavior is inconvenient for your purposes, it is completely within the requirements set by the specification. – Elizabethelizabethan 25/11, 2015 at 17:13

@tg73: "The issue appears to affect any method which...creates a long-lived capturing lambda" -- but only when you have two unrelated anonymous methods in the method in which each captures a different local variable. Methods should be simple; a method that is large enough to contain two independent bits of logic with unrelated variable lifetimes is due for refactoring anyway. IMHO it should be easy enough to work around this issue in any case, by breaking the method into smaller pieces. I can't comment on examples I haven't seen, but it would be unusual to not be able to do this easily. – Elizabethelizabethan 25/11, 2015 at 17:17

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags