Why is a CPU branch instruction slow?

Asked 22/3, 2012 at 10:21 Answered 6/7, 2020 at 10:9

Solved optimization architecture language-agnostic compiler-construction cpu

Since I started programming, I have read in every place to avoid wasteful branches at all costs.

That's fine, although none of the articles explained why I should do this. What exactly happens when the CPU decodes a branch instruction and decides to do a jump? And what is the "thing" that makes it slower than other instructions (like addition)?

Concha answered 22/3, 2012 at 10:21 Comment(0)

A branch instruction is not inherently slower than any other instruction.

However, the reason you heard that branches should avoided is because modern CPUs follow a pipeline architecture. This means that there are multiple sequential instructions being executed simultaneously. But the pipeline can only be fully utilised if it's able to read the next instruction from memory on every cycle, which in turn means it needs to know which instruction to read.

On a conditional branch, it usually doesn't know ahead of time which path will be taken. So when this happens, the CPU has to stall until the decision has been resolved, and throws away everything in the pipeline that's behind the branch instruction. This lowers utilisation, and therefore performance.

This is the reason that things like branch prediction and branch delay slots exist.

Googly answered 22/3, 2012 at 10:26 Comment(4)

+1: Also important for this issue: modern CPUs usually have barnch predictors to minimize this performance loss. – Vasoinhibitor 22/3, 2012 at 10:29

But there are techniques that speed up this process called "branch prediction" (en.wikipedia.org/wiki/Branch_prediction). That does'nt always work but in general it works better. edit: I'm too slow. xD – Haphazardly 22/3, 2012 at 10:31

Just to make it clear: stalling the pipeline means that all the instructions that have been preloaded must be unloaded. Also, any possible side effects must be reverted (usually data that has changed due to a mispredicted branch). All of these operations cost time and energy. – Capsize 22/3, 2012 at 10:32

@Oliver can you eleborate on the branch delay slot? After reading the wiki page, I still can't get why the CPU doesn't simply stall but executes those delay slot instruction when waiting on the branch result. Thanks! – Aesthetics 10/7, 2021 at 8:14

Because CPU adopts pipeline to execute instructions, which means when a previous instruction is being executed at some stage (for example, reading values from registers), the next instruction will get executed at the same time, but at another stage (for example, decoding stage). It is OK for non-control instructions, but it makes thing complex when control instructions like jmp or call are executed.

Since CPU does not know what next instruction will be when executing a jmp instruction, it uses branch prediction techniques to predict whether the branch instruction will be taken or not (For example, a branch instruction in a loop snippet will probably take the instruction flow back to the loop head).

However, when such prediction fails, which is called branch misprediction, it will impact execution performance. Since the pipeline after the branch has to be discarded, and start over from the correct instruction.

Hope answered 22/3, 2012 at 10:32 Comment(0)

Oli gave a very good explanation why branching is expensive: pipeline and branch prediction. I want to add however that you shouldn't be very concerned about the issue as modern compilers will optimize the code and one optimization is reducing branching.

You can read more about C++ optimizations in the Microsoft compiler here - the Profile Guided Optimizer uses runtime information (i.e. which parts of the code are most used) to optimize your code. The speed-up is in the 20% range.

One of the operations is "Conditional Branch Optimization", for example - assuming most of the time i is 6 - this is faster:

if (i==6)
{
    //...
}

else
{
    switch (i)
    {
        case 1: //
        case 2: //
        //...
    }
}

than:

switch (i)
{
    case 1: //
    //...
    case 6: //
    case 7: //
}

Here is a blog post on other optimizations: http://bogdangavril.wordpress.com/2011/11/02/optimizating-your-native-program/

Newby answered 22/3, 2012 at 10:40 Comment(0)

This is not completely related, but advice to avoid branches at all cost is simply nonsense on modern speculative, out-of-order execution processors. Speculative execution is exactly the thing that gives your processor instructions to process while waiting for data from the memory. And speculation on branch conditions is what speculative execution is all about. Replacing branches with arithmetic can actually slow down your program so beware! More about here.

Acciaccatura answered 6/7, 2020 at 10:9 Comment(0)

Recommended topics

Hot tags