Why are lambda expressions not "interned"?
Asked Answered
S

6

15

Strings are reference types, but they are immutable. This allows for them to be interned by the compiler; everywhere the same string literal appears, the same object may be referenced.

Delegates are also immutable reference types. (Adding a method to a multicast delegate using the += operator constitutes assignment; that's not mutability.) And, like, strings, there is a "literal" way to represent a delegate in code, using a lambda expression, e.g.:

Func<int> func = () => 5;

The right-hand side of that statement is an expression whose type is Func<int>; but nowhere am I explicitly invoking the Func<int> constructor (nor is an implicit conversion happening). So I view this as essentially a literal. Am I mistaken about my definition of "literal" here?

Regardless, here's my question. If I have two variables for, say, the Func<int> type, and I assign identical lambda expressions to both:

Func<int> x = () => 5;
Func<int> y = () => 5;

...what's preventing the compiler from treating these as the same Func<int> object?

I ask because section 6.5.1 of the C# 4.0 language specification clearly states:

Conversions of semantically identical anonymous functions with the same (possibly empty) set of captured outer variable instances to the same delegate types are permitted (but not required) to return the same delegate instance. The term semantically identical is used here to mean that execution of the anonymous functions will, in all cases, produce the same effects given the same arguments.

This surprised me when I read it; if this behavior is explicitly allowed, I would have expected for it to be implemented. But it appears not to be. This has in fact gotten a lot of developers into trouble, esp. when lambda expressions have been used to attach event handlers successfully without being able to remove them. For example:

class EventSender
{
    public event EventHandler Event;
    public void Send()
    {
        EventHandler handler = this.Event;
        if (handler != null) { handler(this, EventArgs.Empty); }
    }
}

class Program
{
    static string _message = "Hello, world!";

    static void Main()
    {
        var sender = new EventSender();
        sender.Event += (obj, args) => Console.WriteLine(_message);
        sender.Send();

        // Unless I'm mistaken, this lambda expression is semantically identical
        // to the one above. However, the handler is not removed, indicating
        // that a different delegate instance is constructed.
        sender.Event -= (obj, args) => Console.WriteLine(_message);

        // This prints "Hello, world!" again.
        sender.Send();
    }
}

Is there any reason why this behavior—one delegate instance for semantically identical anonymous methods—is not implemented?

Splendent answered 26/1, 2011 at 17:33 Comment(6)
Even if it worked, I still think it a bad idea to detach an event handler by duplicating the code in the lambda.Immateriality
I suspect that this will be another of those "because the budget wasn't available to design, implement, test, document, ship and maintain" nice-to-haves. Maybe Eric Lippert can chime in with some insider info.Laughingstock
Good question. I would posit that a lambda as seen in code may look identical to another one, but because of referential differences like external closures it would not compile to the same MSIL in both places. The compiler has to compile it to MSIL in order to spot the difference, versus being able to spot a string literal "on the fly" from source code. As this is an extra step that would be required for any lambda, and only provides a small size savings and little if any performance gain, they probably just skipped it.Luing
@LukeH: I share your suspicion; I think the main thing that's throwing me off is that they explicitly called out this possibility in the spec. Maybe they just figured it'd be nice to have the option in case they ever had time for it. But I feel like leaving out that section entirely would've been fine if they weren't interested in doing it.Splendent
@Martinho: I actually agree with you. I just felt it was curious to see that in the spec and so I figured I'd throw the obvious why not? out there.Splendent
Wow all the big guns chiming in for this one. This is one of my SO favourites, though I understand little :)Montherlant
L
11

You're mistaken to call it a literal, IMO. It's just an expression which is convertible to a delegate type.

Now as for the "interning" part - some lambda expressions are cached , in that for one single lambda expression, sometimes a single instance can be created and reused however often that line of code is encountered. Some are not treated that way: it usually depends on whether the lambda expression captures any non-static variables (whether that's via "this" or local to the method).

Here's an example of this caching:

using System;

class Program
{
    static void Main()
    {
        Action first = GetFirstAction();
        first -= GetFirstAction();
        Console.WriteLine(first == null); // Prints True

        Action second = GetSecondAction();
        second -= GetSecondAction();
        Console.WriteLine(second == null); // Prints False
    }

    static Action GetFirstAction()
    {
        return () => Console.WriteLine("First");
    }

    static Action GetSecondAction()
    {
        int i = 0;
        return () => Console.WriteLine("Second " + i);
    }
}

In this case we can see that the first action was cached (or at least, two equal delegates were produced, and in fact Reflector shows that it really is cached in a static field). The second action created two unequal instances of Action for the two calls to GetSecondAction, which is why "second" is non-null at the end.

Interning lambdas which appear in different places in the code but with the same source code is a different matter. I suspect it would be quite complex to do this properly (after all, the same source code can mean different things in different places) and I would certainly not want to rely on it taking place. If it's not going to be worth relying on, and it's a lot of work to get right for the compiler team, I don't think it's the best way they could be spending their time.

Lengel answered 26/1, 2011 at 17:36 Comment(13)
"the same source code" is not "semantically identical" as defined in the spec. (x,y) => x + y could be "semantically identical" to (a,b) => a + b under certain circumstances (no local vars, no this, same types, etc). I understand that this semantic equivalence could be full of corner cases and be hellish to detect, and thus lead to the implementation taking the simple approach.Immateriality
()=>5 at one spot in the source code is not equivalent to ()=>5 at some other spot. The delegates yielded by each are required to compare unequal, which wouldn't happen if they were interned.Hime
@supercat: No, I don't think so - assuming the delegate types are the same, I believe the piece of the spec called out in the question is saying that they wouldn't be required to compare unequal.Lengel
@supercat: They're required to compare unequal? If that's the case then that basically ends the discussion right there. Where is that specified?Splendent
@supercat: No they're not. Look at the excerpt from the spec Dan Tao posted. If they're semantically equivalent, implementations are allowed to return the same instance.Immateriality
@Martinho: Indeed, "semantically identical" is harder than just "the same source code" - my point was that even with just "the same source code" as a starting point, it's really tricky.Lengel
@Jon, @Martinho: I think you guys are being totally reasonable. Any thoughts on why that particular option would be included in the spec to begin with, then? In particular, Jon, it seems quite believable that you're correct in guessing it would be a lot of work to implement with little benefit. So why explicitly mention it at all?Splendent
@Dan: If it's not prohibited, it can always be done later if it turns out that there is a significant benefit. If that door is closed by the spec to start with, there's no wiggle room in the future. Just a guess though.Lengel
@Dan: Mentioning it effectively delays the decision to implement it indefinitely (or until it is implemented :). Not mentioning it would make the decision right there. Also, it lets them do it in some simple cases right now, and only implement the difficult ones if they are really worthy it.Immateriality
@Martinho: Depending upon the circumstances under which delegates may or may not compare equal, code which uses lambdas for event hooking may behave in unexpected ways. For example, the most useful form of "interning" would be to turn lambdas that never use "this" into static methods, so lambdas created by different object instances would all refer to the same delegate, but that could break some event-subscription code.Hime
The quote given in the question only applies to evaluation of lambdas into delegate types. The next numbered section in the C# spec about evaluating lambdas into expression tree types does not say much about whether the compiler is allowed to re-use the same expression tree instance ("interning") or not. Do you happen to know if the C# compiler ever interns expression tree instances in this way? If I convert your example to Expression<Action> it does not intern (but of course that is not a logical proof that it never interns). Related to a Moq thread now linking here.Kirstinkirstyn
@JeppeStigNielsen: I don't believe it does, at the moment.Lengel
Back to delegate types. There are two ways in which delegates can be "equal". The first one is reference equality in which the two references point to the same instance. The second one is equality according to the special == overload which takes in two Delegate and which is equivalent to the override of the Equals virtual method. According to the latter, two instances are "equal" if their MethodInfo are the same, and (for non-static methods) the Target object is the same. It would be legal for an implementation of the C# compiler to make two delegate instances that were "equal".Kirstinkirstyn
J
6

Is there any reason why this behavior—one delegate instance for semantically identical anonymous methods—is not implemented?

Yes. Because spending time on a tricky optimization that benefits almost no one takes time away from designing, implementing, testing and maintaining features that do benefit people.

Johanson answered 26/1, 2011 at 22:10 Comment(1)
This basically confirms what had already been suggested by Jon, LukeH, and possibly others; what had caused me to question this was simply that the possibility of the feature was mentioned in the spec at all. But as Jon (and others) pointed out, this could've been simply to avoid closing a door unnecessarily. Hans also gave a pretty persuasive explanation. Anyway, thanks as always for chiming in with what I guess can only be called the authoritative answer!Splendent
Q
1

Lambdas can't be interned because they use an object to contain the captured local variables. And this instance is different every time you you construct the delegate.

Quag answered 26/1, 2011 at 17:37 Comment(5)
Not all lambdas capture local variables. Dan Tao's example is one of these.Immateriality
Not all, but enough of them that the optimization is probably not worth it. And code should never rely on them being interned for correctness since IMO it would only be an optimization and not by contract. In particular if the optimization were implemented it would make the OPs code work in some implementations, but not in others. And it'd suddenly stop working once variables are captured. So it'd probably even trick more developers than the current behavior.Quag
Also, the part I quoted from the spec indicates that if two lambda expressions in the same method capture the same local variables, then they are semantically identical and can be supported by the same delegate instance. I'm not talking about between method calls, mind you—but within a single method call, the two expressions could use the same object.Splendent
@Dan since it's and optional feature of the specification you'd rely on implementation defined behavior to do so. And I'd rather avoid that. And supercat's example of subscribing both to the same event reveals even more semantic differences created by that optimization.Quag
You're right. I didn't mean to suggest that it would be reasonable for developers to rely on this implementation; I only included the example code to demonstrate that the current implementation doesn't do this. But yeah, I do agree with you.Splendent
H
1

In general, there is no distinction between string variables which refer to the same String instance, versus two variables which refer to different strings that happen to contain the same sequence of characters. The same is not true of delegates.

Suppose I have two delegates, assigned to two different lambda expressions. I then subscribe both delegates to an event handler, and unsubscribe one. What should be the result?

It would be useful if there were a way in vb or C# to designate that an anonymous method or lambda which does not make reference to Me/this should be regarded as a static method, producing a single delegate which could be reused throughout the life of the application. There is no syntax to indicate that, however, and for the compiler to decide to have different lambda expressions return the same instance would be a potentially breaking change.

EDIT I guess the spec allows it, even though it could be a potentially breaking change if any code were to rely upon the instances being distinct.

Hime answered 26/1, 2011 at 17:40 Comment(2)
public static readonly Func<int> TheDeepThoughtLambda = () => 42; :PImmateriality
Code relying on distinct instances is relying on current implementation details and disregarding the spec. If it was implemented now, without any change to the spec, would you call that a breaking change, because non-conforming code broke?Immateriality
F
1

The other answers bring up good points. Mine really doesn't have to do with anything technical - Every feature starts out with -100 points.

Fixture answered 26/1, 2011 at 17:42 Comment(0)
S
1

This is allowed because the C# team cannot control this. They heavily rely on the implementation details of delegates (CLR + BCL) and the JIT compiler's optimizer. There already is quite a proliferation of CLR and jitter implementations right now and there is little reason to assume that's going to end. The CLI spec is very light on rules about delegates, not nearly strong enough to ensure all these different teams will end up with an implementation that guarantees that delegate object equality is consistent. Not in the least because that would hamper future innovation. There's lots to optimize here.

Sialkot answered 26/1, 2011 at 18:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.