Logical operators and branch prediction failure
Asked Answered
F

0

7

Consider the following loops:

while((expressionA) & (expressionB)){
    // do something
}
while((expressionA) && (expressionB)){
    // do something
}

where expressionA and expressionB are expressions of type bool and expressionB has no side-effects. Under these conditions, the two cases are as-if-equivalent (right?).

A (hypothetical) compiler that naively takes its cue from the source code would put a branch in the && version and we would end up paying for branch prediction failures.

With a modern compiler (such as current GCC), can there ever be any conditions under which the & version gives a substantial performance gain over of the && version?


My guess is no, because:

  • If expressionB is sufficiently cheap, the compiler will recognize this and avoid creating the short-circuiting branch.
  • If expressionB is sufficiently expensive, the compiler will create the short-circuit because:
    • if probability of expressionA is not close to 1.0, we get a substantial average performance gain from short-circuiting.
    • if probability of expressionA is close to 1.0, we won't pay much because branch prediction will tend to succeed.
Falda answered 3/12, 2015 at 8:0 Comment(7)
If expressionA has side-effects , they cannot be avoidedParoxysm
AFAIK there's no way of telling the optimizer which cases are the most likely, which makes it difficult for cases like if ( a && b && c && d && e ) where there are no side-effects but computationally expensive. Theoretically the C standard would allow the tests to be done in any order so even if you put the most expensive first, the optimizer might not honour that.Paroxysm
What I'm finding just playing around in GCC 5.2 on some simplistic tests is that operator& can actually cause a slight difference in register allocation, but identical to operator&& in terms of branching (at least with operands that evaluate to bool). One thing, as M.M. alluded to, is that a compiler can't truly figure out what is "cheap" here, since the probability/predictability of a coin flip expression is often going to be based around inputs only provided at runtime. It goes beyond the basic evaluation. What I'm seeing is short-circuiting being applied for both operators here.Apian
@Paroxysm Could you please explain why you brought up the side-effects of expressionA?Falda
@Ike Thanks. By "cheap expression" I just mean one that can be evaluated quickly, so in that sense the compiler can figure out what is cheap. Probability matters of course, but as a separate factor. Take the case where expressionB is just a bool variable whose value is unknown at compile-time but is already evaluated (at this point in run-time) because its value was already needed elsewhere in the code. That value is available "cheaply" and making the short-circuit in that case (for any of the two operators) would just seem wasteful, especially because branch prediction may fail.Falda
gcc usually makes a separate branch instruction for each piece of a short-circuit-evaluated conditional. If (A&B) predicts well, but neither A nor B alone predict well, I think you could see a benefit to using A&B over A&&B. Turning a comparison into a value that can actually be bitwise-ANDed with something else usually isn't cheap, though. It's a lot easier in the || vs. | case, because usually you can just compute things an OR them together. e.g. 2 | 1 is true, but 2 & 1 is false.Rustyrut
@tennenrishin Understood what you meant there by cheap/expensive as relating to the cost of evaluation. What I was seeing fooling around in GCC was that, in cases where both operators produce logically equivalent results, the optimizer was still short-circuiting them (yielding two branch instructions instead of an and instruction + branch). My guess for why it does that is that it still can't assume anything on the part of the predictability of an expression to be able to easily deduce that an and instruction would not degrade branch predictability.Apian

© 2022 - 2024 — McMap. All rights reserved.