Where the likely/unlikely statement should be placed for more performance?
Asked Answered
R

1

6

Some software (often performance-oriented, e.g. Linux kernel, DPDK) has C helpers for influencing branch prediction.

I have an absolutely simple code snippet (suppose I know the percantage of a > b) to represent the problem of conditions nesting and applying likely/unlikely when some logic is nested:

bool foo()
{
    foo1(1);
    foo2(2);

    /* if (unlikely(a > b)) */
    /* if (a > b)*/
    {
        puts("Ohhh!!! Rare case");
        return true;
    }
    return false;
}

int main(void)
{
    /* if (unlikely(foo())) */
    /* if (foo()) */
    {
        puts("Azaza");
    }
}

So which 2 lines should be uncomented for more performance from a theoretical point of view?

Obviously there are 3 ways to assist compilier with branch prediction:

1. if (unlikely(a > b)) ... if (unlikely(foo()))

2. if (a > b) ... if (unlikely(foo()))

3. if (unlikely(a > b)) ... if (foo())

Which is theoretically the most efficient and why?

Reasonable answered 6/7, 2023 at 11:47 Comment(13)
@TedLyngmo Tsyvarev is absolutely right! The question is about how to do it in case of nesting. Updated questionReasonable
@Tsyvarev You're absolutely rightReasonable
Perfect. My question is removed.Wampum
My gut feeling says that you should put it as close to the place where you have knowledge about the expected outcome. That is, inside foo() here. Then again, it would probably not hurt to add it to the if in main too.Wampum
Updated question. Seems that close votes can be revoked.. ^_^Reasonable
In this case, foo should be written as simply return a > b;, without any branches. If you have more code than just return in the if/else then it's fine, but in that case of course the likely should be in foo.Diaphoresis
@Diaphoresis "but in that case of course the likely should be in foo" - why should not it be in main() too? Btw - a bit edited the code :)Reasonable
It can be in main too. The compiler might be smart enough to not need it if it inlines foo, but it won't hurt.Diaphoresis
@Diaphoresis It is more logical to assume that first of all it should be in main(), IMHO it's better to cut off the wrong branch of execution earlier.Reasonable
But foo happens earlier than the check in mainDiaphoresis
I don't think dpdk and linux kernel should be tagged here. Why not performance and/or optimisation?Leonardoleoncavallo
It is more logical to assume that first of all it should be in main() - likely/unlikely is for predicting the outcome of a single branch. You have two branches here. If you add likely/unlikely to one, the other still remains unoptimized. In theory compilers might be able to make an inference of the connection or make the function inline, but unless you know that it does happen, I wouldn't rely on it.Tret
The top answer on the Q&A you linked is mostly obsolete. See comments on it like mine and BeeOnRope's: How do the likely/unlikely macros in the Linux kernel work and what is their benefit? , and Ciro's answer: likely/unlikely influences branch layout, and the decision to make branchy vs. branchless asm if that's an option. But nothing can hint the actual CPU hardware branch prediction about which is more likely.Beecham
R
0

To my knowledge, the likely/unlikely statements show the best effect if the conditional variable is not in the cache. Let me explain in more detail.

In your code, the processor has to execute foo anyway. So the likely won't have any strong effect here, since no code path can be skipped during speculative execution. The function must be executed. Let's assume you save the result of foo before in a variable and the code looks like this:

int x = foo();
if (likely(x))
{
    puts("Azaza");
}

In this case the likely(x) will probably only influence the prefetcher/decoder, since the processor just computed x and the value is most likely cached in L1 (except it got interrupted at that point). However, for a precise answer one has to know the microarchitecture extremely well, since it might be possible that current CPUs are so advanced that they can fetch and decode both branches at once.

Now let's assume you have a global variable volatile int c = 15 and we change your code:

if (likely(b == 15))
{
    puts("Azaza");
} else {
    puts("Bzaza");
}

When we execute the code and b is accessed for the first time it won't be in the cache and the processor has to fetch it from memory. This costs multiple hundred CPU-cycles and instead of stalling the CPU starts executing code speculatively without knowing the value of b. In that case, the likely keyword indicates that we should execute the first branch. Note that the executed instructions are not visible to the outside world at that time. Modern x86 processors can execute up to 400 micro-ops speculatively and only commit the result once the prediction turned out to be true.

So to answer your question I would put the likely/unlikely keyword around the a > b.

Rotten answered 27/7, 2023 at 11:23 Comment(9)
... or add the keyword around both branches (assuming no compiler optimization for the simple code)Thermoscope
tldr: there is much more to it and the CPU will save the last branches to optimize the branch outcome on itself. I can recommend reading the initial spectre paper or the technical report from P0, specifically the section about branch prediction googleprojectzero.blogspot.com/2018/01Thermoscope
The Q&A linked by the OP has a mostly obsolete top answer. See comments on it like mine and BeeOnRope's: How do the likely/unlikely macros in the Linux kernel work and what is their benefit? , and Ciro's answer: likely/unlikely influences branch layout, and the decision to make branchy vs. branchless asm if that's an option. But nothing can hint the actual CPU hardware branch prediction about which is more likely. (On modern x86 and AArch64).Beecham
This answer would make sense for PowerPC and some MIPS where machine code can contain branch hints for the CPU. And Pentium 4, the only x86 microarchitecture to support branch hints. (Is it possible to tell the branch predictor how likely it is to follow the branch?). Older x86 like P4 and P6 may also have static prediction influenced by branch layout, but since Sandybridge or Haswell (IT-TAGE predictors) no. Why did Intel change the static branch prediction mechanism over these years?Beecham
Thanks for pointing it out. Indeed, there is no static branch prediction in x86 as the p0 blogpost points out. However, for infrequently used branches there is probably no branch information available (we found that around 80 branches are enough to confuse the predictor github.com/benschlueter/CVE-2021-33624/blob/…). If we hit such a branch it is up to the micro-arch to decide what to do. Either there is some type of 'hashing' for the address and we rely on branch information from another address or ....Thermoscope
we can just fall back to a default case (i.e. always not take or always take the branch). I implicitly assumed that the latter one is the case for x86, which might/probably not be true.Thermoscope
I just looked up the spec for the A-65 (developer.arm.com/documentation/100439/latest), p 78 indicates that static branch prediction is still used, so it might be possible that my assumption was correct in the case no dynamic information is available.Thermoscope
Yes, true, some non-x86 CPUs may use the classic BTNF (backward taken, forward not-taken) as a fallback when a dynamic prediction isn't available. I think IT-TAGE (in x86, found in Haswell / Zen 2 and later) will always make a direction prediction for conditional direct branches, though; it doesn't try to figure out whether the indexed entry is "for that branch" or not: branches can alias each other, and one branch's state can be distributed over many entries if it's reached with many different branch histories leading up to it. (Making a target address prediction is harder, but separate.)Beecham
Since your answer talks about the cold case for the data, it's not unreasonable that might happen when predictors are cold for the branch, too. Maybe worth emphasizing that's the only case when [[likely]] / [[unlikely]] will affect actual branch prediction by CPU hardware outside of a few ISAs with branch hints, not in hot cases like inside loops. (And even then, some modern microarchitectures like recent x86 don't have static BTFN at all.)Beecham

© 2022 - 2024 — McMap. All rights reserved.