Does the branch predictor kick in with this?
Asked Answered
Y

2

9

Most, if not all modern processors utilize a technique called "branch prediction", with which it guesses what way to go in an if-then-else branch.

I have a question considering the scheme. Let's say we have this piece of code, in no specific language:

if(someCondition)
{
    // some action
    return someValue;
}
// some other action
return someOtherValue;

Logically speaking, that code is equivalent to this code:

if(someCondition)
{
    // some action
    return someValue;
}
else
{
    // some other action
    return someOtherValue;
}

The branch predictor would 'predict' the branch in the second example, but what about the first example? Will it guess? What will be loaded into the pipeline? Is there any speed to be gained with either of the examples disregarding the impact of actual code in the blocks?

My guess, it is dependent on the compiler: If statements are implemented (in assembly) using jumps which are only undertaken if the compare flag in the register is set. Now what exactly the assembly instructions will look like depends on the compiler. Unless there is a common way of handling it that every compiler does, which I doubt there is, then this is compiler dependent. In that case, what would happen on the latest Visual Studio C++ and GC++ compilers?

As hexafraction pointed out, the relation between the return values as well as how someCondition is determined... the branch predictor might not kick in. Let us consider only true and false as return values. For the condition, let us assume both that it is a field which has been predetermined, either inside or outside the function, a local variable, and some arithmetic statement.

To be honest, I do not suspect there is much difference between the case that the condition is a local variable and the case that the field has been predetermined in the same function.

Yingyingkow answered 11/8, 2015 at 23:13 Comment(3)
Remember that there are sometimes numerical optimizations that a compiler can take that involve no branching. Depending on how your someCondition is calculated and the relation between the two return values, it is theoretically possible that in some cases branch-less logic/bit twiddling/arithmetic may be possible. Additionally, architectures such as ARM have conditional execution, meaning that a lot of logic that would involve branching can be done branchlessly. Conditional instructions carry a condition as part of the opcode, and if the condition isn't met the inst. is turned into a nop.Gaylegayleen
It's highly unlikely that those two pieces of code wouldn't compile to the exact same machine code. If you want to talk about CPU behavior, compare assembly/machine code.Atmospherics
I think you can omit the "Logically speaking". Those two snippets are exactly equivalent, and I'd expect the compiler to output the same bytecode/assembly for them. So the branch predictor cannot see any difference, and will treat them the same…Volleyball
E
4

Most likely gcc -O3 would optimise that to a branchless sequence, using a conditional-move instructions. e.g. on x86

# generate someValue in %rax, the x86-64 ABI's return value register
# generate someOtherValue in %rdi, to pick one at random
    test someCondition   # probably actually test or cmp a register
    cmovz  %rdi, %rax    # copy %rdi to %rax, if the zero flag is set.
    ret

cmov has a data dependency on both its inputs and on flags. A conditional branch is a control dependency. Using cmov is often good, unless it's part of one long dependency chain and the branch is fairly predictable.

If there was more work inside the if blocks, gcc would generate a conditional jump instruction.

# generate someValue in %rax
    test someCondition
    jz  .zero
    ret
.zero:
    # compute someOtherValue.  This work doesn't need to happen at all
    # if we don't end up needing it, unlike in the cmov case
    mov  someOtherValue, %rax
    ret

Branch prediction operates on conditional-jump instructions, not on high-level constructs. The same instructions are used to jump back to the top of a loop if the loop condition is true. According to http://agner.org/optimize/ , recent Intel CPUs remember patterns of up to 64 iterations for loops. So loops that run the same number of iterations every time don't have a branch mispredict on the last iteration, if the number of iterations is 64 or less.

So it's not the sequence of instructions that the branch predictor looks at to guess whether a jump will be taken or not-taken. Every separate branch instruction gets an entry in the branch-history-buffer when it's taken. And yes, every compiler has no choice but to use jcc (Jump on Condition Code) instructions to implement branches/loops.

The default is predict not-taken. If that prediction is correct, the CPU doesn't evict potentially-still-useful info from the cache to make room. See Agner Fog's microarch doc for more low-level details.


On Linux, to see the branch predictor in action, you can use perf stat:

perf stat /bin/ls  # in some big directory
    ... normal ls output

 Performance counter stats for '/bin/ls':

     10.403069      task-clock (msec)         #    0.094 CPUs utilized
         2,255      context-switches          #    0.217 M/sec
             0      cpu-migrations            #    0.000 K/sec
           190      page-faults               #    0.018 M/sec
    16,612,260      cycles                    #    1.597 GHz
     7,843,399      stalled-cycles-frontend   #   47.21% frontend cycles idle
     5,205,565      stalled-cycles-backend    #   31.34% backend  cycles idle
    20,227,093      instructions              #    1.22  insns per cycle
                                              #    0.39  stalled cycles per insn
     3,975,777      branches                  #  382.173 M/sec
########### These two lines ######
        55,785      branch-misses             #    1.40% of all branches

   0.110765717 seconds time elapsed

Intel Sandybridge (i5 2500k), at low-power clock speed, with the default cpufreq governor, doesn't ramp up in clock speed before ls is done.

Ecliptic answered 12/8, 2015 at 1:6 Comment(0)
T
2

There is no difference between those two code samples. The else is irrelevant because there is no need to branch at the end of the true clause. Even were that not true, the branch at the end of the true clause would not be conditional.

In other words, the code must compile to something like:

  Compute test expression
  Branch if false to false_label
  True action
  Return some value
False_label;
  False action
  Return some other value
Turncoat answered 12/8, 2015 at 0:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.