Eric Lippert's answer really hits the point. However it would be nice to build a picture of how stack frames and captures work in general. To do this it helps to look at a slightly more complex example.
Here is the capturing code:
public class Scorekeeper {
int swish = 7;
public Action Counter(int start)
{
int count = 0;
Action counter = () => { count += start + swish; }
return counter;
}
}
And here is what I think the equivalent would be (if we are lucky Eric Lippert will comment on whether this is actually correct or not):
private class Locals
{
public Locals( Scorekeeper sk, int st)
{
this.scorekeeper = sk;
this.start = st;
}
private Scorekeeper scorekeeper;
private int start;
public int count;
public void Anonymous()
{
this.count += start + scorekeeper.swish;
}
}
public class Scorekeeper {
int swish = 7;
public Action Counter(int start)
{
Locals locals = new Locals(this, start);
locals.count = 0;
Action counter = new Action(locals.Anonymous);
return counter;
}
}
The point is that the local class substitutes for the entire stack frame and is initialized accordingly each time the Counter method is invoked. Typically the stack frame includes a reference to 'this', plus method arguments, plus local variables. (The stack frame is also in effect extended when a control block is entered.)
Consequently we do not have just one object corresponding to the captured context, instead we actually have one object per captured stack frame.
Based on this, we can use the following mental model: stack frames are kept on the heap (instead of on the stack), while the stack itself just contains pointers to the stack frames that are on the heap. Lambda methods contain a pointer to the stack frame. This is done using managed memory, so the frame sticks around on the heap until it is no longer needed.
Obviously the compiler can implement this by only using the heap when the heap object is required to support a lambda closure.
What I like about this model is it provides an integrated picture for 'yield return'. We can think of an iterator method (using yield return) as if it's stack frame were created on the heap and the referencing pointer stored in a local variable in the caller, for use during the iteration.