How to cancel branch prediction? [closed]
Asked Answered
H

2

5

From reading this I came across the next two quotes:

First quote:

A typical case of unpredictable branch behavior is when the comparison result is dependent on data.

Second quote:

No Branches Means No Mispredicts

For my project, I work on a dependent data and I perform many if and switch statements. My project is related to Big Data so it has to be as efficient as possible. So I wanted to test it on the data provided by user, to see if branch prediction actually slows down my program or helps. As from reading here:

misprediction delay is between 10 and 20 clock cycles.

What shocked me most was:

Removing the branches not only improves runtime performance of the code, it also helps the compiler to optimize the code.

Why use branch prediction then ?

Is there a way to force the compiler to generate assembly code without branches ? or to disable branch prediction so that CPU? so I can compare both results ?

Husserl answered 10/2, 2017 at 20:24 Comment(10)
Your question is answered with the answer to the linked question.Amphibole
Note that, if CPU supports them, not every condition causes a branch. Compilers are pretty good at this. BTW any decent cpu AFAIK won't slow down your code in case of wrong predictions (in a measurable way and in average!), it just won't speed it up. To measure? Prepare an ad-hoc set of dataRainproof
@GeorgeStocker Thanks! used this to understand what branch prediction is but didn't notice at the bottom he provided some useful information.Husserl
@AdrianoRepetti thanks for the additional information.Husserl
I think that to answer this question "in general" is an hard task. Maybe narrowing the question to a specific case will make it answerableRainproof
We need one more vote for reopen... @4386427Husserl
"Removing branches" doesn't mean turning off branch prediction. It means literally removing branches from your code, e.g. by finding "if" statements that can be reduced to arithmetic expressions.Endbrain
@4386427 This question is too broad. I can't even tell what the OP is asking about: is branch prediction a bad, useless, thing? (Absolutely not) Why not avoiding branch entirely? (Because we can't always do that and predicated instructions are not equivalent to branchless code) How to disable the BPU? (You can't, what the CPU would do with, say, 823 uOPs in flight, when it sees a jump? Stall?). Then, for which architecture? All of them? And where to start, considering that the OP seems to have little knowledge of modern superscalar, out of order CPUs? GeorgeStocker was right.Michelinamicheline
Finally, there is a very upvoted answer on the right panel. It explains very well the different performance between a well predicted program and a poorly predicted one. The BP is something that exists and you don't work against it but along with it. See the x86 tag wiki for the Agner Fog optimisation manual and the Intel one. Asking about disabling the BPU is like asking about disabling the cache "because of cache misses"...Michelinamicheline
@4386427 Where a good answer could come if not from a decennial experienced designer? Please, experts first :)Michelinamicheline
H
10

to see if branch prediction actually slows down my program or helps

Branch prediction doesn't slow down programs. When people talk about the cost of missed predictions, they're talking about how much more expensive a mispredicted branch is compared to a correctly predicted branch.

If branch prediction didn't exist, all branches would be as expensive as a mispredicted one.

So what "misprediction delay is between 10 and 20 clock cycles" really means is that successful branch prediction saves you 10 to 20 cycles.

Removing the branches not only improves runtime performance of the code, it also helps the compiler to optimize the code.

Why use branch prediction then ?

Why use branch prediction over removing branches? You shouldn't. If a compiler can remove branches, it will (assuming optimizations are enabled), and if programmers can remove branches (assuming it doesn't harm readability or it's a performance-critical piece of code), they should.

That hardly makes branch prediction useless though. Even if you remove as much branches as possible from a program, it will still contain many, many branches. So because of this and because of how expensive unpredicted branches are, branch prediction is essential for good performance.

Is there a way to force the compiler to generate assembly code without branches ?

An optimizing compiler will already remove branches from a program when it can (without changing the semantics of the program), but, unless we're talking about a very simple int main() {return 0;}-type program, it's impossible to remove all branches. Loops require branches (unless they're unrolled, but that only works if you know the number of iterations ahead of time) and so do most if- and switch-statements. If you can minimize the number of ifs, switches and loops in your program, great, but you won't be able to remove all of them.

or to disable branch prediction so that CPU? so I can compare both results ?

To the best of my knowledge it is impossible to disable branch prediction on x86 or x86-64 CPUs. And as I said, this would never improve performance (though it might make it predictable, but that's not usually a requirement in the contexts where these CPUs are used).

Hewet answered 10/2, 2017 at 21:53 Comment(4)
In some cases a small-body, low but variable count loop can be profitably fully (max count) unrolled with predication/conditional move "masking off" the effects of the unneeded computation.Eulalie
Thanks for the answer. Is flushing the pipeline free ? If a misprediction then it has be pulled out. I understand the gain incase of a hit. But if the CPU misses then won't that mean we still pay some cycle to flush the pipeline ?Husserl
@TonyTannous That depends on the exact CPU architecture (as do most things we're talking about here, really), but my understanding is that on modern CPUs there's no wait time involved in flushing the pipeline. A quick google search brings up this article, which seems to confirm this ("Once a branch misprediction is discovered, the core is able to restart decoding as soon as the correct path is known, at the same time that the out-of-order machine is clearing out uops from the wrongly speculated path. ...Hewet
... Previously, the decoding would not resume until the pipeline was fully flushed."). But even on earlier architectures where that wasn't the case, the cost of flushing the pipeline was much smaller than the gains of successful branch prediction, so even there branch predicition will would be a net gain unless somehow all predictions failed.Hewet
B
6

Modern processors have pipelines which allow the CPU to work a lot faster than it would be able to otherwise. This is a form of parallelism where it starts processing an instruction a few clock cycles before the instruction is actually needed. See here here for more details.

This works great until we hit a branch. Since we are jumping, the work that is in the pipeline is no longer relevant. The CPU then needs to flush the pipeline and restart again. This causes a delay of a few clock cycles until the pipeline is full again. This is known as a pipeline stall.

Modern CPUs are clever enough when it comes to unconditional jumps to follow the jump when filling the pipeline thus preventing the stall. This does not work when it comes to branching since the CPU does not know exactly where the jump is going to go.

Branch Prediction tries to solve this problem by making a guess as to which branch the CPU will follow before fully evaluating the jump. This (when it works) prevents the stall.

Since almost all programming involves making decisions, branching is unavoidable. But one certainly can write code with fewer branches and thus lessen the delays caused by misprediction. Once we are branching, branch prediction at least allows us a chance of getting things right and not having a CPU pipeline stall.

Broadspectrum answered 10/2, 2017 at 21:39 Comment(1)
One minor nitpick: even unconditional jumps can be problematic as they can be variable (think of MIPS jar) or indirect (e.g. x86 jmp [rbx]).Michelinamicheline

© 2022 - 2024 — McMap. All rights reserved.