Branch prediction: Does avoiding "else" branch for simple operations makes code faster (Java example)?

Asked 4/2, 2015 at 9:16 Answered 7/3, 2021 at 21:43

java assembly compiler-construction branch-prediction vm-implementation

Options 1:

  boolean isFirst = true;
  for (CardType cardType : cardTypes) {
    if (!isFirst) {
      descriptionBuilder.append(" or ");
    } else {
      isFirst = false;
    }
    //other code not relevant to this theoretical question
  }

Option 2:

boolean isFirst = true;
for (CardType cardType : cardTypes) {
  if (!isFirst) {
    descriptionBuilder.append(" or ");
  } 
  isFirst = false;
  //other code not relevant to this theoretical question
}

My analysis: Both code has same semantic.

1st code) I'm not sure if this code has two branches (in terms of branch predictor) or one branch. I was looking into http://en.wikipedia.org/wiki/X86_instruction_listings but couldn't figure out that there is a X86 instruction something like "if previous condition value was false jump there", to avoid two branch predictions (very bad)

2nd code) most likely to always perform simple MOV (to register or element most likely already in the cache), which is relatively inexpensive (few cycles at most)

So, my opinion is that unless processor decode into microcode instructions can do something smart or X86 instruction exist to avoid necessary branch predictions, 2nd code is faster.

I understand that this is purely theoretical question, since in practice, this branch can make an application 0.000000002% faster or something like that.

Did I miss something?

EDIT: I added a loop for giving more "weight" to branch in question

EDIT2: The question is about Intel architecture for branch prediction (Pentium and newer processors).

Inelegancy answered 4/2, 2015 at 9:16 Comment(0)

The code has the same effect but won't produce the same byte code or assembly (probably).

How much difference this makes in terms of performance, is unclear and likely to be trivial.

What is far, far more important is the clarity of the code. I have seen more bugs and performance issues due to code being harder to reason about in simple cases like this.

In short, what is clearest and simplest to you is also likely to be fast enough, or the easiest to fix.

Autobiographical answered 4/2, 2015 at 9:23 Comment(6)

this is theoretical question about code performance, I understand that the difference will be in few CPU cycles per iteration at most, but I would like to know which one is faster, not which one is more readable – Inelegancy 4/2, 2015 at 16:24

@MladenAdamovic Theoretically, anything is possible which is why I avoid giving a theoretical answer. ;) – Autobiographical 4/2, 2015 at 16:27

I've added EDIT2: The question is about Intel architecture for branch prediction (Pentium and newer processors). So not "anything" is possible. There is a certain way Java compiler works and certain way branch predictor works. – Inelegancy 4/2, 2015 at 16:30

@MladenAdamovic The amount of branch predicting it needs to make is the same. The only time there is a difference is when a simple operation can be turned into a conditional-move. e.g. a > b ? a : b In this case, there is no branch as a CMOV instruction is used. – Autobiographical 4/2, 2015 at 16:40

why is the same? I was looking into CMOV instruction and that instruction seems to be added to avoid two branches for that kind of examples a > b ? a : b rcollins.org/p6/opcodes/CMOV.html My Example 1) has additional "else" which is potentially one more branch than my example 2) – Inelegancy 4/2, 2015 at 17:32

@MladenAdamovic Your second example is the same as if (!isFirst) { descriptionBuilder.append(" or "); } else { } i.e. the is the same number of possible missed predictions. – Autobiographical 4/2, 2015 at 17:35

Different hardware has different costs for each of the assembler instruction, and on modern hardware even the cost of an instruction is difficult to predict due to the effects of pipelining and caches.

The difference between an if and an if/else on pipelining and caches is not clear from your isolated example. If you ran that code once, it is unlikely that you will see any difference at all. Repeatedly run it in a tight loop, and the performance of the if itself will become dominated by a) the cost of the check and b) the predictability of the result of the check. In other words, branch prediction will become the dominating factor and that will not be affected by having an if or an if/else block of code.

An excellent discussion on the effects of branch prediction can be read here Why is it faster to process a sorted array than an unsorted array? (see the top scoring answer).

Callie answered 4/2, 2015 at 9:58 Comment(7)

I've added outer for loop in example, so that branch prediction could work (at least theoretical). In this code does "else" brings additional branch prediction for CPU? – Inelegancy 4/2, 2015 at 16:25

@MladenAdamovic the quick answer is no, the addition of else adds an extra unconditional jump (which by definition is very easy to predict). See eventhelix.com/realtimemantra/basics/… for an example of how an if/else is translated to assembler. – Callie 4/2, 2015 at 17:39

@MladenAdamovic but also see my other answer; hotspot likes to unroll the first iteration of a for loop specifically for this case. That way it avoids repeating the check and the conditional branch entirely for subsequent iterations of the loop thus bypassing your question by providing a faster solution. – Callie 4/2, 2015 at 17:42

Thanks @Chris K , thank link you provided brought to my attention that "else" is unconditional jump. I now wonder, for nearby locations (which are already in L1 cache, how many cycles unconditional jump ("else") takes, comparing to MOV between registers or L1/L2. But as you noticed, compiler could figure out that nothing in this loop changes isFirst to true, so it could perform simple compiler optimization. I didn't realize it is related to unrolling loops, I thought unrolling loops is mostly for better branch prediction and paralelism – Inelegancy 4/2, 2015 at 18:23

@MladenAdamovic the cost of a predictable jump will be measured in clock cycles (probably 1 or less - that is, many many orders of magnitude faster than a human can perceive). All of the gory details can be found at the following link, be warned though.. it is quite a read: intel.com/content/www/us/en/architecture-and-technology/… – Callie 5/2, 2015 at 9:18

that link seems to be a great reading for people who construct compilers, but since I'm full stack generalist, for me it is a little bit too heavy :-) thanks, I'll put it into Reading list if one day I have time for it – Inelegancy 5/2, 2015 at 10:19

@mladenadamovic. I did warn you :). The quickest take away is the appendixes where it lists latency and throughput times for assembler instructions. That and 1 clock cycle on a 3ghz chip is .3ns, or 0.0000000003 seconds. So if the unconditional jump is invoked only once because of the loop unrolling, then it is practically negligible in cost. – Callie 5/2, 2015 at 10:23

Using JMH gives the following numbers with cardTypes array of size 10 and integer increment as the logic (Java 15 / AMD 3950X / Windows 10):

Benchmark          Mode  Cnt          Score         Error  Units
Benchmark.option1  thrpt   25  273369417.720 ± 1618952.179  ops/s
Benchmark.option2  thrpt   25  273415784.192 ±  852618.585  ops/s

Average performance of "Option 2" is about 0.017% faster (YMMV).

Hoberthobey answered 7/3, 2021 at 21:43 Comment(1)

Note that the difference isn't statistically significant, given the error ranges. But yeah, that's what I'd expect; xor-zeroing a register is at least as cheap as an unconditional jmp over it. Of course even better is the compiler hoisting / peeling the first iteration out of the asm loop, or jumping into it after the not-first-iteration work, so there's no actual if branch in the loop. But if the compiler was smart enough to do that, it would probably happen for both so this benchmark wouldn't capture it (and is compatible with that since 273... +- 1 numbers are within error) – Heterogenesis 8/3, 2021 at 2:17

Assuming that your code snippet is an if block from within a for loop. Hotspot has the ability to unroll for loops, this includes taking the common 'is first iteration of a loop' check and inlining it outside of the loop. Thus avoiding the costs of rechecking the condition on every iteration of the loop. Thus avoiding the concern of which is faster, if or if/else.

Oracle documents this behavior here

Callie answered 4/2, 2015 at 10:16 Comment(0)

Both code has same semantic.

No Both code are different,

first code app isFirst = false; set flag to false if your condition if (!isFirst) not match.

Second code each time change your flag to false even condition satisfied or not.

Cornea answered 4/2, 2015 at 9:21 Comment(1)

But the flag would have to be false for the condition to be satisfied. So in both cases, the flag has the value false at the end of the snippet. – Inconveniency 4/2, 2015 at 9:24

There are two branches in an if/else construct: the conditional branch at the top, and the branch around the else part at the end of the if part. There are no branches in the else part, at least not in any even moderately competently implemented compiler.

Against that you have to balance the cost of always executing the isFirst = false; line.

In the specific case you mention, it isn't likely to make the slightest difference, compared to the cost of the method call.

Tinkle answered 4/2, 2015 at 9:29 Comment(1)

I don't understand "There are no branches in the else part, at least not in any even moderately competently implemented compiler.". Could you please explain why? (assembly code example would be fine). – Inelegancy 4/2, 2015 at 16:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags