Can a C# statement generate non connected MSIL
Asked Answered
L

2

7

The question is about C# language specification and CIL language specification, as well as Microsoft's and Mono's C# compiler behavior.

I'm building some code analysis tools (no matter what), which operate on CIL.

Considering a few code samples, I notice that code statements (try/catch, ifelse, ifthen, loops, ...) generate connected blocks of MSIL.

But I'd like to be sure that I can't write C# code construct which yields non-connected MSIL. More specifically, can I write any C# statement which translates to (something similar to):

IL_0000: 
IL_0001: 
IL_0002: 

// hole

IL_001a: 
IL_001b:

I already tried some weird stuff using goto and nested loops, but maybe I'm not as mad as some users would be.

Lindsaylindsey answered 23/4, 2019 at 14:33 Comment(2)
@Hans From a comment on my (now-deleted) answer, he said that the // hole referred to other IL instructions, not related to the C# statement in question. I asked him to edit the question to clarify this.Superaltar
The only thing about IL statements you need to worry about is that each individual statement is emitted correctly and that, when control leaves a method, the stack state is valid. Other than that, you can do whatever you want in terms of ordering instructions. (This implies, that when an instruction pulls something off the stack, it is the expected type.)Vitalism
J
13

Sure, that's trivially possible. Something like:

static void M(bool x)
{
    if (x)
        return;
    else
        M(x);
    return;
}

If you compile that in debug mode you get

    IL_0000: nop
    IL_0001: ldarg.0
    IL_0002: stloc.0
    IL_0003: ldloc.0
    IL_0004: brfalse.s IL_0008
    IL_0006: br.s IL_0011
    IL_0008: ldarg.0
    IL_0009: call void A::M(bool)
    IL_000e: nop
    IL_000f: br.s IL_0011
    IL_0011: ret

The if statement goes from 0001 to 0009, and the consequence of the if is a goto to 0011; both return statements are the same code, so there is a "hole" containing a nop and an unconditional branch between the main body of the if and the consequence.

More generally, you should never assume anything whatsoever about the layout of the IL produced by the C# compiler. The compiler makes no guarantees whatsoever other than that the IL produced will be legal and, if safe, verifiable.


You say you are writing some code analysis tools; as the author of significant portions of the C# analyzer, and someone who worked on third-party analysis tools at Coverity, a word of advice: for the majority of questions you typically want answered about C# programs, the parse tree produced by Roslyn is the entity you wish to analyze, not the IL. The parse tree is a concrete syntax tree; it is one-to-one with every character in the source code. It can be very difficult to map optimized IL back to the original source code, and it can be very easy to produce false positives in an IL analysis.

Put another way: source-to-IL is semantics-preserving but also information-losing; you typically want to analyze the artifact that has the most information in it.

If you must, for whatever reason, operate your analyzer at the IL level, your first task should probably be to find the boundaries of the basic blocks, particularly if you are analyzing reachability properties.

A "basic block" is a contiguous chunk of IL where the end point of the block does not "carry on" to the following instruction -- because it is a branch, return or throw, for instance -- and there are no branches into the block to anywhere except the first instruction.

You can then form a graph of basic blocks for each method, indicating which ones can possible transfer control to which other blocks. This "raises the level" of your analysis; instead of analyzing the effects of a sequence of IL instructions, now you're analyzing the effects of a graph of basic blocks.

If you say more about what sorts of analysis you're doing I can advise further.

Japanese answered 23/4, 2019 at 17:4 Comment(0)
S
1

In theory yes (this comes from my experience) . Your analysis tool does not deal with c# directly, but works on IL code only. IL can be produced by anybody, not only by visual studio, but also by other language compilers like visual basic, python. Net... and obfuscators! Obfuscators are the real culprit:while other compilers try to adhere to the specs, obfuscators do their best to exploit the specs and the target runtime.

Obfuscated code might violate certain common sense patterns. Consider this case: certain smart obfuscators produce illegal msil, but the jitter digest it because it happens that the invalid portions are in the end not executed.

When building an analysis tool, you can't handle these cases unless your target is to build a deobfuscator.

Sigismundo answered 23/4, 2019 at 14:56 Comment(2)
An obfuscator that produces illegal IL is skating on thin ice; the jitter is permitted to run an IL verifier before verifying the method, and reject the method if it fails verification. The jitter does so if the method is in a low-trust context because low trust code is required to be verifiable.Japanese
Thank you Eric! You pointed out a critical bit: low trust. That explains why an obfuscator we used in the past, worked only on particular cases when all the "optimizations" were set.Sigismundo

© 2022 - 2024 — McMap. All rights reserved.