Consider the following loops:
while((expressionA) & (expressionB)){
// do something
}
while((expressionA) && (expressionB)){
// do something
}
where expressionA
and expressionB
are expressions of type bool
and expressionB
has no side-effects. Under these conditions, the two cases are as-if-equivalent (right?).
A (hypothetical) compiler that naively takes its cue from the source code would put a branch in the &&
version and we would end up paying for branch prediction failures.
With a modern compiler (such as current GCC), can there ever be any conditions under which the &
version gives a substantial performance gain over of the &&
version?
My guess is no, because:
- If
expressionB
is sufficiently cheap, the compiler will recognize this and avoid creating the short-circuiting branch. - If
expressionB
is sufficiently expensive, the compiler will create the short-circuit because:- if probability of
expressionA
is not close to 1.0, we get a substantial average performance gain from short-circuiting. - if probability of
expressionA
is close to 1.0, we won't pay much because branch prediction will tend to succeed.
- if probability of
if ( a && b && c && d && e )
where there are no side-effects but computationally expensive. Theoretically the C standard would allow the tests to be done in any order so even if you put the most expensive first, the optimizer might not honour that. – Paroxysmoperator&
can actually cause a slight difference in register allocation, but identical tooperator&&
in terms of branching (at least with operands that evaluate tobool
). One thing, as M.M. alluded to, is that a compiler can't truly figure out what is "cheap" here, since the probability/predictability of a coin flip expression is often going to be based around inputs only provided at runtime. It goes beyond the basic evaluation. What I'm seeing is short-circuiting being applied for both operators here. – ApianexpressionA
? – FaldaexpressionB
is just abool
variable whose value is unknown at compile-time but is already evaluated (at this point in run-time) because its value was already needed elsewhere in the code. That value is available "cheaply" and making the short-circuit in that case (for any of the two operators) would just seem wasteful, especially because branch prediction may fail. – FaldaA&B
overA&&B
. Turning a comparison into a value that can actually be bitwise-ANDed with something else usually isn't cheap, though. It's a lot easier in the||
vs.|
case, because usually you can just compute things anOR
them together. e.g.2 | 1
is true, but2 & 1
is false. – Rustyrutand
instruction + branch). My guess for why it does that is that it still can't assume anything on the part of the predictability of an expression to be able to easily deduce that anand
instruction would not degrade branch predictability. – Apian