Avoiding branches in managed languages
Asked Answered
R

4

6

In C, when compiled to a x86 machine, I would usually replace branches with logical expression, when speed is the most important aspect, even if the conditions are complex, for example, instead of

char isSomething() {
    if (complexExpression01) {
        if (complexExpression02) {
            if(!complexExpression03) {
                return 1;
            }
        }
    }
    return 0;
}

I will write:

char isSomething() {
    return complexExpression01 &&
           complexExpression02 &&
           !complexExpression03 ;
}

Now clearly, this might be a harder to maintain, and less readable code, but it might actually be faster.

Is there any reason to act the same way when working with managed code, such as C#? Are "jumps" expensive in managed code as they are in unmanaged code (at least on x86)?

Russon answered 8/11, 2011 at 7:49 Comment(2)
Won't a && actually compile into a branch too?Petras
@HenkHolterman Personally I'd be asking if, with optimization, would the compiler output identical code? In which case, you can have your readability and performance, too.Abjure
S
1

The two expressions will result in the same number of tests, as the logical and operator (&&) has short-circuit semantics in both C and C#. The premise of your question (that the second way of expressing the program results in less branching) is therefore incorrect.

Steelhead answered 8/11, 2011 at 7:59 Comment(3)
I didn't fully understand your answer to begin with (which was my mistake in the question as well). But it makes much sense.Russon
This does not answer the question. A branch here should be interpreted as a conditional jump instruction that distrupts the CPU's pipelines and can cause penalties. Remeber CS-class and the halting problem? This is semantic property of the program and the optimizer may detect this but it cannot detect all cases and optimize your code. I don't know the answer to the quetsion myself, though...Helminthology
I am not sure I understand the problem. There is no answer to the question, as the rewrite does not eliminate any branches in the first place. Unless your compiler has very strange internals, exactly the same set of optimizations will be applicable to each of the programs.Steelhead
E
4

General

In your regular compiler, the generated code will most often be the same, at least when assuming that you are using regular

csc.exe /optimize+
cl.exe /O2
g++ -O2

and related default optimization modes.

The general mantra is: profile, profile, profile (and don't micro-optimize until your profiler tells you to). You can always look at the generated code2 to see whether there is room for improvement.

Think of it this way, e.g. the C# code:

C#/.NET

Each of your complexExpressions is a de-facto function call invocation (call, calli, callvirt opcode3) that requires it's arguments to be pushed onto the stack. The return value will be left pushed onto the stack instead of the parameters on exit.

Now, CLR being a stack-based virtual machine (i.e. register-less) this amounts to exactly the same as an anonymous temporary variable on the stack. The only difference is the number of identifiers used in the code.

Now what the JIT engine does with that is another matter: the JIT engine will have to translate these calls into native assembly and may be doing optimization by tweaking register-allocation, instruction ordering, branch prediction and stuff like that1

1 (though in practice, for this sample it won't be allowed to do the more interesting optimizations because the complex function calls may have sideeffects and the C# specs are very clear about the evaluation order and so-called sequence)point. Note however, that the JIT engine is allowed to inline function calls, to reduce call overhead.

Not only when they are non-virtual, but (IIRC) also when the runtime type can be known statically at compile time for certain .NET framework internals. I'd have to look up a reference for this, but in fact I think there are attributes introduced in .NET framework 4.0 to explicit prevent inlining of framework functions; this is so that Microsoft can patch library code in service packs/updates, even when user assemblies have been ahead-of-time compiled (ngen.exe) into native images.

C/C++

In C/C++ the memory model is much more lax (i.e. at least until C++11) and the code is usually compiled down to native instructions at compile time directly. Add that C/C++ compilers usually do aggressive inlining, the code even in such compilers will usually be the same, unless you compile without optimization enabled


2 I use

  • ildasm or monodis to see the IL code generated
  • mono -aot=full,static or mkbundle to generate native object modules and objdump -CdS to see the annotated native assembly instructions for that.

Note this is purely curiosity, because it rarely happens that I find interesting bottlenecks that way. See however, Jon Skeet's blog posts on performance optimizing Noda.NET for good examples of surprises that may lurk in the generated IL code for Generic classes.

3 Edit not accurate for operators on compiler intrinsics, though even they will just leave their result on the stack.

Expiatory answered 8/11, 2011 at 7:51 Comment(1)
Thanks for the detailed answer. You taught me more than one thing here. Yet, @Ulrik pointed out my basic confusion. I really appreciate your answer.Russon
B
2

This is up to the implementation of the CLR and compiler of the managed language. In case of C#, the following test case proofs that there is no difference in instructions for nested if statements and combined if statements:

            // case 1
            if (value1 < value2)
00000089  mov         eax,dword ptr [ebp-0Ch] 
0000008c  cmp         eax,dword ptr [ebp-10h] 
0000008f  jge         000000A6 
            {
                if (value2 < value3)
00000091  mov         eax,dword ptr [ebp-10h] 
00000094  cmp         eax,dword ptr [ebp-14h] 
00000097  jge         000000A6 
                {
                    result1 = true;
00000099  mov         eax,1 
0000009e  and         eax,0FFh 
000000a3  mov         dword ptr [ebp-4],eax 
                }
            }

            // case 2
            if (value1 < value2 && value2 < value3)
000000a6  mov         eax,dword ptr [ebp-0Ch] 
000000a9  cmp         eax,dword ptr [ebp-10h] 
000000ac  jge         000000C3 
000000ae  mov         eax,dword ptr [ebp-10h] 
000000b1  cmp         eax,dword ptr [ebp-14h] 
000000b4  jge         000000C3 
            {
                result2 = true;
000000b6  mov         eax,1 
000000bb  and         eax,0FFh 
000000c0  mov         dword ptr [ebp-8],eax 
            }
Buchmanism answered 8/11, 2011 at 8:12 Comment(0)
S
1

The two expressions will result in the same number of tests, as the logical and operator (&&) has short-circuit semantics in both C and C#. The premise of your question (that the second way of expressing the program results in less branching) is therefore incorrect.

Steelhead answered 8/11, 2011 at 7:59 Comment(3)
I didn't fully understand your answer to begin with (which was my mistake in the question as well). But it makes much sense.Russon
This does not answer the question. A branch here should be interpreted as a conditional jump instruction that distrupts the CPU's pipelines and can cause penalties. Remeber CS-class and the halting problem? This is semantic property of the program and the optimizer may detect this but it cannot detect all cases and optimize your code. I don't know the answer to the quetsion myself, though...Helminthology
I am not sure I understand the problem. There is no answer to the question, as the rewrite does not eliminate any branches in the first place. Unless your compiler has very strange internals, exactly the same set of optimizations will be applicable to each of the programs.Steelhead
A
0

The only way to know is to measure.

True and false are represented as 1 and 0 by the CLR, so it wouldn't surprise me if using logical expressions had some advantage. Let's see:

static void BenchBranch() {
    Stopwatch sw = new Stopwatch();

    const int NMAX = 1000000000;
    bool a = true;
    bool b = false;
    bool c = true;

    sw.Restart();
    int sum = 0;
    for (int i = 0; i < NMAX; i++) {
        if (a)
            if (b)
                if (c)
                    sum++;
        a = !a;
        b = a ^ b;
        c = b;
    }
    sw.Stop();
    Console.WriteLine("1: {0:F3} ms ({1})", sw.Elapsed.TotalMilliseconds, sum);

    sw.Restart();
    sum = 0;
    for (int i = 0; i < NMAX; i++) {
        if (a && b && c) 
            sum++;
        a = !a;
        b = a ^ b;
        c = b;
    }
    sw.Stop();
    Console.WriteLine("2: {0:F3} ms ({1})", sw.Elapsed.TotalMilliseconds, sum);

    sw.Restart();
    sum = 0;
    for (int i = 0; i < NMAX; i++) {
        sum += (a && b && c) ? 1 : 0;
        a = !a;
        b = a ^ b;
        c = b;
    }
    sw.Stop();
    Console.WriteLine("3: {0:F3} ms ({1})", sw.Elapsed.TotalMilliseconds, sum);
}

The result:

1:  2713.396 ms (250000000)
2:  2477.912 ms (250000000)
3:  2324.916 ms (250000000)

So, from this it seems there is a slight advantage to using logical operators instead of nested conditional statements. However, any specific instance may give somewhat different results.

In the end, whether a micro-optimization such as this is worth it depends on how performance-critical the code is.

Alarick answered 8/11, 2011 at 8:13 Comment(3)
These tests can be misleading since many external factors can influance the counter. It's more consistent to check the resulting assembly build using each methodBuchmanism
Measuring anything other than the actual production code can be misleading. But looking at the assembly can be equally misleading, since not all factors that influence the JIT compiler's instruction choices are known to us. In other words: "many external factors can influence the resulting assembly."Alarick
That is true but when comparing a number of cases it gives the most accurate comparison between the cases.Buchmanism

© 2022 - 2024 — McMap. All rights reserved.