Very interesting question. You are attempting to use the IL execution stack for the purpose of storing a queue of arbitrary data items. This introduces an unusual condition, versus conventional IL code, where the correct balance of the stack critically depends on the number runtime loop iterations exactly matching the number of ILAsm-time data items that were burned-into the IL. As you noted, the program (repeated here) does not work.
(in fact, in my build which uses link.exe
with /LTCG
, the linker fails to even produce the assembly, giving "fatal error C1352: Invalid or corrupt MSIL in function
.")
.method public static void ExecStackResidual() // !!! FAILS - BAD EXAMPLE - NO !!!
{
.locals init (int32 i, int32 cur)
ldc.i4.6 // enqueue item -- NO!
ldc.i4.5 // enqueue item -- NO!
ldc.i4.4 // enqueue item -- NO!
ldc.i4.0
stloc i
br _next
_more:
stloc cur // de-queue item -- NO!
ldloc cur
box int32
call void Debug::WriteLine(object)
ldloc i
ldc.i4.1
add
stloc i
_next:
ldloc i
ldc.i4.3
blt _more
ret
}
The problem is n̲o̲t̲ due a simple logical flaw or off-by-one bug in your code. This can be shown by the fact that commenting out the controversial parts as follows works fine, printing 3 zeros.
//-- ldc.i4.6 // commented out
//-- ldc.i4.5 // commented out
//-- ldc.i4.4 // commented out
ldc.i4.0
stloc i
br _next
_more:
//-- stloc cur // commented out
ldloc cur
box int32
call void Debug::WriteLine(object)
ldloc i
ldc.i4.1
add
stloc i
_next:
ldloc i
ldc.i4.3
blt _more
ret
The OP did some sleuthing and discovered ECMA-335, III.1.7.5, which seems like it could be relevant here, since the main difference between the working and failing example is that the latter existentially requires having a non-empty evaluation stack (a.k.a. "sequence point") at location _more
, and that location does indeed, to quote the spec...
"...immediately follow an unconditional branch [here, br _next
], and where [_more
] is not the target of an earlier branch instruction."
Unfortunately however, this doesn't seem to be quite the full explanation, because as long as you remove the queued items in an balanced way that can be discerned statically, the evaluation stack apparently does not have to be empty at location _more
. This is demonstrated by the following code, which also works fine, printing the 3 zeros, despite having several items on the execution stack at the ECMA-335, III.1.7.5-vulnerable location _more
.
ldc.i4.6 // enqueue item -- ok
ldc.i4.5 // enqueue item -- ok
ldc.i4.4 // enqueue item -- ok
ldc.i4.0
stloc i
br _next
_more:
//-- stloc cur // de-queue item -- still commented out
ldloc cur
box int32
call void Debug::WriteLine(object)
ldloc i
ldc.i4.1
add
stloc i
_next:
ldloc i
ldc.i4.3
blt _more
pop // de-queue item -- required
pop // de-queue item -- required
pop // de-queue item -- required
ret
The OP also uses the terminology "backward branch constraint," but it's unclear whether the phrase was found in a spec or perhaps an original contribution. It seems possibly entailed by the appearance of the phrase "...earlier branch instruction" in the spec. Either way it raises a tantalizing question of whether the error can be avoided by rearranging the code so that there are no locations that (technically) match the "earlier" (technicality) in the ECMA-335, III.1.7.5 constraint.
A related idea is that "unconditional branch" in the spec means only br
family instructions. To get around br
, we can instead embed a ret
instruction within the method body as shown next. As you likely guessed, this is to no avail. Although the spec doesn't explicitly say so, it obviously intends ret
to be included as an "unconditional branch." This is common sense, and accordingly the following example still does not work:
// !!! FAILS - BAD EXAMPLE - NO
ldc.i4.6 // enqueue item -- NO!
ldc.i4.5 // enqueue item -- NO!
ldc.i4.4 // enqueue item -- NO!
ldc.i4.0
stloc i
_next:
ldloc i
ldc.i4.3
blt _more
ret
_more:
stloc cur // de-queue item -- NO! -- still follows an "unconditional branch"
ldloc cur
box int32
call void Debug::WriteLine(object)
ldloc i
ldc.i4.1
add
stloc i
br _next
All-in-all, I don't think you're going to get this technique to work, due to its fundamental requirement that (a.) facts that get hard-coded into the IL (i.e., number of enqueued data items) must correspond perfectly with (b.) facts that require runtime interpretation (i.e., number of loop iterations).
I think a more basic summary of the problem, as opposed to the ECMA description, is that all of the failure examples entail that the number of items on the execution stack experienced by one (or more) of the method's instructions is not fixed, instead obtaining different values at different times as the method executes, and this is the underlying situation that will always be strictly disallowed, no matter how you achieve it. To me, that seems like the more general inviolable constraint.
For example, at instruction _more
in the demo method—and all within the scope of a single invocation—there would first be 2, then 1, then 0 "excess" items on the execution stack (notice that I subtracted one for each iteration from what you might have expected, and this is because I used the word "excess" just prior as an attempt to highlight the fact that within each individual loop iteration, one item is properly expected—and required—at location _more
, namely for the putative dequeuing operation stloc cur
).
I'd speculate that for an IL method to be valid, there must be no way for any of its instructions to experience variation in the prevailing depth of the execution stack. In other words, there must be one-and-exactly-only-one stack depth value which can be statically determined and assigned to each of the method's instructions. Intuitively, the situation otherwise would likely render the JIT task either wildly intractable or perhaps even provably impossible.