Where the likely/unlikely statement should be placed for more performance?

Asked 6/7, 2023 at 11:47 Answered 27/7, 2023 at 11:23

c linux-kernel compiler-optimization dpdk likely-unlikely

Some software (often performance-oriented, e.g. Linux kernel, DPDK) has C helpers for influencing branch prediction.

I have an absolutely simple code snippet (suppose I know the percantage of a > b) to represent the problem of conditions nesting and applying likely/unlikely when some logic is nested:

bool foo()
{
    foo1(1);
    foo2(2);

    /* if (unlikely(a > b)) */
    /* if (a > b)*/
    {
        puts("Ohhh!!! Rare case");
        return true;
    }
    return false;
}

int main(void)
{
    /* if (unlikely(foo())) */
    /* if (foo()) */
    {
        puts("Azaza");
    }
}

So which 2 lines should be uncomented for more performance from a theoretical point of view?

Obviously there are 3 ways to assist compilier with branch prediction:

1. if (unlikely(a > b)) ... if (unlikely(foo()))

2. if (a > b) ... if (unlikely(foo()))

3. if (unlikely(a > b)) ... if (foo())

Which is theoretically the most efficient and why?

Reasonable answered 6/7, 2023 at 11:47 Comment(13)

@TedLyngmo Tsyvarev is absolutely right! The question is about how to do it in case of nesting. Updated question – Reasonable 6/7, 2023 at 12:51

@Tsyvarev You're absolutely right – Reasonable 6/7, 2023 at 12:56

Perfect. My question is removed. – Wampum 6/7, 2023 at 13:2

My gut feeling says that you should put it as close to the place where you have knowledge about the expected outcome. That is, inside foo() here. Then again, it would probably not hurt to add it to the if in main too. – Wampum 6/7, 2023 at 13:4

Updated question. Seems that close votes can be revoked.. ^_^ – Reasonable 6/7, 2023 at 13:5

In this case, foo should be written as simply return a > b;, without any branches. If you have more code than just return in the if/else then it's fine, but in that case of course the likely should be in foo. – Diaphoresis 6/7, 2023 at 13:11

@Diaphoresis "but in that case of course the likely should be in foo" - why should not it be in main() too? Btw - a bit edited the code :) – Reasonable 6/7, 2023 at 14:28

It can be in main too. The compiler might be smart enough to not need it if it inlines foo, but it won't hurt. – Diaphoresis 6/7, 2023 at 14:44

@Diaphoresis It is more logical to assume that first of all it should be in main(), IMHO it's better to cut off the wrong branch of execution earlier. – Reasonable 6/7, 2023 at 15:35

But foo happens earlier than the check in main – Diaphoresis 6/7, 2023 at 20:9

I don't think dpdk and linux kernel should be tagged here. Why not performance and/or optimisation? – Leonardoleoncavallo 7/7, 2023 at 5:6

It is more logical to assume that first of all it should be in main() - likely/unlikely is for predicting the outcome of a single branch. You have two branches here. If you add likely/unlikely to one, the other still remains unoptimized. In theory compilers might be able to make an inference of the connection or make the function inline, but unless you know that it does happen, I wouldn't rely on it. – Tret 7/7, 2023 at 6:28

The top answer on the Q&A you linked is mostly obsolete. See comments on it like mine and BeeOnRope's: How do the likely/unlikely macros in the Linux kernel work and what is their benefit? , and Ciro's answer: likely/unlikely influences branch layout, and the decision to make branchy vs. branchless asm if that's an option. But nothing can hint the actual CPU hardware branch prediction about which is more likely. – Beecham 27/7, 2023 at 17:11

To my knowledge, the likely/unlikely statements show the best effect if the conditional variable is not in the cache. Let me explain in more detail.

In your code, the processor has to execute foo anyway. So the likely won't have any strong effect here, since no code path can be skipped during speculative execution. The function must be executed. Let's assume you save the result of foo before in a variable and the code looks like this:

int x = foo();
if (likely(x))
{
    puts("Azaza");
}

In this case the likely(x) will probably only influence the prefetcher/decoder, since the processor just computed x and the value is most likely cached in L1 (except it got interrupted at that point). However, for a precise answer one has to know the microarchitecture extremely well, since it might be possible that current CPUs are so advanced that they can fetch and decode both branches at once.

Now let's assume you have a global variable volatile int c = 15 and we change your code:

if (likely(b == 15))
{
    puts("Azaza");
} else {
    puts("Bzaza");
}

When we execute the code and b is accessed for the first time it won't be in the cache and the processor has to fetch it from memory. This costs multiple hundred CPU-cycles and instead of stalling the CPU starts executing code speculatively without knowing the value of b. In that case, the likely keyword indicates that we should execute the first branch. Note that the executed instructions are not visible to the outside world at that time. Modern x86 processors can execute up to 400 micro-ops speculatively and only commit the result once the prediction turned out to be true.

So to answer your question I would put the likely/unlikely keyword around the a > b.

Rotten answered 27/7, 2023 at 11:23 Comment(9)

... or add the keyword around both branches (assuming no compiler optimization for the simple code) – Thermoscope 27/7, 2023 at 11:49

tldr: there is much more to it and the CPU will save the last branches to optimize the branch outcome on itself. I can recommend reading the initial spectre paper or the technical report from P0, specifically the section about branch prediction googleprojectzero.blogspot.com/2018/01 – Thermoscope 27/7, 2023 at 11:59

The Q&A linked by the OP has a mostly obsolete top answer. See comments on it like mine and BeeOnRope's: How do the likely/unlikely macros in the Linux kernel work and what is their benefit? , and Ciro's answer: likely/unlikely influences branch layout, and the decision to make branchy vs. branchless asm if that's an option. But nothing can hint the actual CPU hardware branch prediction about which is more likely. (On modern x86 and AArch64). – Beecham 27/7, 2023 at 17:15

This answer would make sense for PowerPC and some MIPS where machine code can contain branch hints for the CPU. And Pentium 4, the only x86 microarchitecture to support branch hints. (Is it possible to tell the branch predictor how likely it is to follow the branch?). Older x86 like P4 and P6 may also have static prediction influenced by branch layout, but since Sandybridge or Haswell (IT-TAGE predictors) no. Why did Intel change the static branch prediction mechanism over these years? – Beecham 27/7, 2023 at 17:19

Thanks for pointing it out. Indeed, there is no static branch prediction in x86 as the p0 blogpost points out. However, for infrequently used branches there is probably no branch information available (we found that around 80 branches are enough to confuse the predictor github.com/benschlueter/CVE-2021-33624/blob/…). If we hit such a branch it is up to the micro-arch to decide what to do. Either there is some type of 'hashing' for the address and we rely on branch information from another address or .... – Thermoscope 28/7, 2023 at 7:22

we can just fall back to a default case (i.e. always not take or always take the branch). I implicitly assumed that the latter one is the case for x86, which might/probably not be true. – Thermoscope 28/7, 2023 at 7:25

I just looked up the spec for the A-65 (developer.arm.com/documentation/100439/latest), p 78 indicates that static branch prediction is still used, so it might be possible that my assumption was correct in the case no dynamic information is available. – Thermoscope 28/7, 2023 at 7:37

Yes, true, some non-x86 CPUs may use the classic BTNF (backward taken, forward not-taken) as a fallback when a dynamic prediction isn't available. I think IT-TAGE (in x86, found in Haswell / Zen 2 and later) will always make a direction prediction for conditional direct branches, though; it doesn't try to figure out whether the indexed entry is "for that branch" or not: branches can alias each other, and one branch's state can be distributed over many entries if it's reached with many different branch histories leading up to it. (Making a target address prediction is harder, but separate.) – Beecham 28/7, 2023 at 7:49

Since your answer talks about the cold case for the data, it's not unreasonable that might happen when predictors are cold for the branch, too. Maybe worth emphasizing that's the only case when [[likely]] / [[unlikely]] will affect actual branch prediction by CPU hardware outside of a few ISAs with branch hints, not in hot cases like inside loops. (And even then, some modern microarchitectures like recent x86 don't have static BTFN at all.) – Beecham 28/7, 2023 at 7:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags