.NET CIL manipulation of evaluation stack

Asked 17/10, 2012 at 3:3 Answered 11/2, 2019 at 2:38

I have this sequence of CIL codes which I injected through the use of Mono.Cecil. However, the modified .NET C# application will not run.

Objective: Manually load and pop values from stack to display in Console.WriteLine

 for (int i = 0; i < 3; i++)
        {
            int z = some value popped manually from stack;                 
            Console.WriteLine(z);
        }

This is the simple main() program I modified:

.method private hidebysig static void Main(string[] args) cil managed
{

    .entrypoint
    .maxstack 5
    .locals init (
        [0] int32 num,
        [1] int32 num2)
    L_0000: ldc.i4.6 //manually push value 6 to stack
    L_0001: ldc.i4.5 //manually push value 5 to stack
    L_0002: ldc.i4.4 //manually push value 4 to stack
    L_0003: ldc.i4.0 //push int i initial value 0 to stack 
    L_0004: stloc.0 //pop and store to int i variable to variable num
    L_0005: br.s L_0013
    L_0007: nop 
    L_0008: stloc.1 //pop the pushed values 6,5 and 4 to variable num2
    L_0009: ldloc.1 //load value of num2 to stack
    L_000a: call void [mscorlib]System.Console::WriteLine(int32) //pop value of num2 and print
    L_000f: ldloc.0 //load previous value in variable num to stack
    L_0010: ldc.i4.1 //load incremental value 1 to stack
    L_0011: add //pop and add the top 2 values, result is pushed to stack
    L_0012: stloc.0 //store the new result to variable num. (int i)
    L_0013: ldloc.0 //push int i variable value to stack
    L_0014: ldc.i4.3 //push value 3 to stack as number of times to loop
    L_0015: blt.s L_0007 //branch less than (pop and cmp the top 2 values in stack)
    L_0017: ret 
}

However, the above code cannnot run. I tried changing blt.s to clt and br_true.s but it doesn't work either. Does anyone know if it is possible to attain my objective? Thanks.

EDIT: According to ECMA-335, III.1.7.5, there might be a backward branch constraint. Not sure if this is the case.

In particular, if that single-pass analysis arrives at an instruction, call it location X, that immediately follows an unconditional branch, and where X is not the target of an earlier branch instruction, then the state of the evaluation stack at X, clearly, cannot be derived from existing information. In this case, the CLI demands that the evaluation stack at X be empty.

Chattel answered 17/10, 2012 at 3:3 Comment(2)

What's the actual error you get running the program? Have you tried running peverify on your modified program? – Achromatize 17/10, 2012 at 5:15

You are giving the verifier way too much of a hard time to check that the stack is balanced. It would have to look deep enough into the code to analyze how often the loop runs. It does not do that. – Stonewort 17/10, 2012 at 9:31

You IL-Code looks ok, but i think the CLR might not be able to check if the stack is corrupted after the method completes. When something is pushed onto the stack, the CLR checks if the value are also popped from the stack.

So if you push 3 values onto the stack the CLR not might be able to check if your loop is running three times, so the CLR doesn't know if there are still values onto the stack when the method is returning.

Suntan answered 17/10, 2012 at 9:20 Comment(1)

i suspected so since the stack is volatile within a procedure and have to be emptied before a 'call'. was too used to the flexibility provided in binary assembly. – Chattel 18/10, 2012 at 0:5

Very interesting question. You are attempting to use the IL execution stack for the purpose of storing a queue of arbitrary data items. This introduces an unusual condition, versus conventional IL code, where the correct balance of the stack critically depends on the number runtime loop iterations exactly matching the number of ILAsm-time data items that were burned-into the IL. As you noted, the program (repeated here) does not work.

(in fact, in my build which uses link.exe with /LTCG, the linker fails to even produce the assembly, giving "fatal error C1352: Invalid or corrupt MSIL in function.")

.method public static void ExecStackResidual()      // !!! FAILS - BAD EXAMPLE - NO !!!
{
    .locals init (int32 i, int32 cur)

    ldc.i4.6        // enqueue item  -- NO!
    ldc.i4.5        // enqueue item  -- NO!
    ldc.i4.4        // enqueue item  -- NO!

    ldc.i4.0
    stloc i
    br _next

_more:
    stloc cur       // de-queue item  -- NO!

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more
    ret 
}

The problem is n̲o̲t̲ due a simple logical flaw or off-by-one bug in your code. This can be shown by the fact that commenting out the controversial parts as follows works fine, printing 3 zeros.

    //-- ldc.i4.6       // commented out
    //-- ldc.i4.5       // commented out
    //-- ldc.i4.4       // commented out

    ldc.i4.0
    stloc i
    br _next

_more:
    //-- stloc cur      // commented out

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more
    ret

The OP did some sleuthing and discovered ECMA-335, III.1.7.5, which seems like it could be relevant here, since the main difference between the working and failing example is that the latter existentially requires having a non-empty evaluation stack (a.k.a. "sequence point") at location _more, and that location does indeed, to quote the spec...

"...immediately follow an unconditional branch [here, br _next], and where [_more] is not the target of an earlier branch instruction."

Unfortunately however, this doesn't seem to be quite the full explanation, because as long as you remove the queued items in an balanced way that can be discerned statically, the evaluation stack apparently does not have to be empty at location _more. This is demonstrated by the following code, which also works fine, printing the 3 zeros, despite having several items on the execution stack at the ECMA-335, III.1.7.5-vulnerable location _more.

    ldc.i4.6       // enqueue item  -- ok
    ldc.i4.5       // enqueue item  -- ok
    ldc.i4.4       // enqueue item  -- ok

    ldc.i4.0
    stloc i
    br _next

_more:
    //-- stloc cur      // de-queue item  -- still commented out

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more

    pop         // de-queue item  -- required
    pop         // de-queue item  -- required
    pop         // de-queue item  -- required
    ret

The OP also uses the terminology "backward branch constraint," but it's unclear whether the phrase was found in a spec or perhaps an original contribution. It seems possibly entailed by the appearance of the phrase "...earlier branch instruction" in the spec. Either way it raises a tantalizing question of whether the error can be avoided by rearranging the code so that there are no locations that (technically) match the "earlier" (technicality) in the ECMA-335, III.1.7.5 constraint.

A related idea is that "unconditional branch" in the spec means only br family instructions. To get around br, we can instead embed a ret instruction within the method body as shown next. As you likely guessed, this is to no avail. Although the spec doesn't explicitly say so, it obviously intends ret to be included as an "unconditional branch." This is common sense, and accordingly the following example still does not work:

    // !!! FAILS - BAD EXAMPLE - NO
    ldc.i4.6        // enqueue item  -- NO!
    ldc.i4.5        // enqueue item  -- NO!
    ldc.i4.4        // enqueue item  -- NO!

    ldc.i4.0
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more
    ret 

_more:
    stloc cur       // de-queue item  -- NO! -- still follows an "unconditional branch"

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i
    br _next

All-in-all, I don't think you're going to get this technique to work, due to its fundamental requirement that (a.) facts that get hard-coded into the IL (i.e., number of enqueued data items) must correspond perfectly with (b.) facts that require runtime interpretation (i.e., number of loop iterations).

I think a more basic summary of the problem, as opposed to the ECMA description, is that all of the failure examples entail that the number of items on the execution stack experienced by one (or more) of the method's instructions is not fixed, instead obtaining different values at different times as the method executes, and this is the underlying situation that will always be strictly disallowed, no matter how you achieve it. To me, that seems like the more general inviolable constraint.

For example, at instruction _more in the demo method—and all within the scope of a single invocation—there would first be 2, then 1, then 0 "excess" items on the execution stack (notice that I subtracted one for each iteration from what you might have expected, and this is because I used the word "excess" just prior as an attempt to highlight the fact that within each individual loop iteration, one item is properly expected—and required—at location _more, namely for the putative dequeuing operation stloc cur).

I'd speculate that for an IL method to be valid, there must be no way for any of its instructions to experience variation in the prevailing depth of the execution stack. In other words, there must be one-and-exactly-only-one stack depth value which can be statically determined and assigned to each of the method's instructions. Intuitively, the situation otherwise would likely render the JIT task either wildly intractable or perhaps even provably impossible.

Alinealinna answered 11/2, 2019 at 2:38 Comment(0)

Recommended topics

Hot tags