Logical operators and branch prediction failure - McMap

About

Logical operators and branch prediction failure

Asked 3/12, 2015 at 8:0 Answered 3/12, 2015 at 8:0

c++performance compiler-optimization short-circuiting branch-prediction

F

0

7

Consider the following loops:

while((expressionA) & (expressionB)){
    // do something
}

while((expressionA) && (expressionB)){
    // do something
}

where expressionA and expressionB are expressions of type bool and expressionB has no side-effects. Under these conditions, the two cases are as-if-equivalent (right?).

A (hypothetical) compiler that naively takes its cue from the source code would put a branch in the && version and we would end up paying for branch prediction failures.

With a modern compiler (such as current GCC), can there ever be any conditions under which the & version gives a substantial performance gain over of the && version?

My guess is no, because:

If expressionB is sufficiently cheap, the compiler will recognize this and avoid creating the short-circuiting branch.
If expressionB is sufficiently expensive, the compiler will create the short-circuit because:
- if probability of expressionA is not close to 1.0, we get a substantial average performance gain from short-circuiting.
- if probability of expressionA is close to 1.0, we won't pay much because branch prediction will tend to succeed.

Falda answered 3/12, 2015 at 8:0 Comment(7)

If expressionA has side-effects , they cannot be avoided – Paroxysm 3/12, 2015 at 8:36

AFAIK there's no way of telling the optimizer which cases are the most likely, which makes it difficult for cases like if ( a && b && c && d && e ) where there are no side-effects but computationally expensive. Theoretically the C standard would allow the tests to be done in any order so even if you put the most expensive first, the optimizer might not honour that. – Paroxysm 3/12, 2015 at 8:38

What I'm finding just playing around in GCC 5.2 on some simplistic tests is that operator& can actually cause a slight difference in register allocation, but identical to operator&& in terms of branching (at least with operands that evaluate to bool). One thing, as M.M. alluded to, is that a compiler can't truly figure out what is "cheap" here, since the probability/predictability of a coin flip expression is often going to be based around inputs only provided at runtime. It goes beyond the basic evaluation. What I'm seeing is short-circuiting being applied for both operators here. – Apian 3/12, 2015 at 8:43

@Paroxysm Could you please explain why you brought up the side-effects of expressionA? – Falda 3/12, 2015 at 10:21

@Ike Thanks. By "cheap expression" I just mean one that can be evaluated quickly, so in that sense the compiler can figure out what is cheap. Probability matters of course, but as a separate factor. Take the case where expressionB is just a bool variable whose value is unknown at compile-time but is already evaluated (at this point in run-time) because its value was already needed elsewhere in the code. That value is available "cheaply" and making the short-circuit in that case (for any of the two operators) would just seem wasteful, especially because branch prediction may fail. – Falda 3/12, 2015 at 10:28

gcc usually makes a separate branch instruction for each piece of a short-circuit-evaluated conditional. If (A&B) predicts well, but neither A nor B alone predict well, I think you could see a benefit to using A&B over A&&B. Turning a comparison into a value that can actually be bitwise-ANDed with something else usually isn't cheap, though. It's a lot easier in the || vs. | case, because usually you can just compute things an OR them together. e.g. 2 | 1 is true, but 2 & 1 is false. – Rustyrut 3/12, 2015 at 13:3

@tennenrishin Understood what you meant there by cheap/expensive as relating to the cost of evaluation. What I was seeing fooling around in GCC was that, in cases where both operators produce logically equivalent results, the optimizer was still short-circuiting them (yielding two branch instructions instead of an and instruction + branch). My guess for why it does that is that it still can't assume anything on the part of the predictability of an expression to be able to easily deduce that an and instruction would not degrade branch predictability. – Apian 3/12, 2015 at 17:40

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.