Simple example where [[likely]] and [[unlikely]] affect program assembly?

Asked 5/6, 2020 at 18:45 Answered 5/6, 2020 at 20:9

Solved c++assembly gcc c++20 likely-unlikely

C++20 introduced the attributes [[likely]] and [[unlikely]] to the language, which can be used to allow the compiler to optimize for the case where one execution path is either much more likely or much less likely than the other ones.

Given the cost of an incorrect branch prediction, this seems like a feature that's potentially extremely useful in performance-critical sections of code, but I don't know what it would actually cause the compiler to do.

Is there a simple piece of code for which adding [[likely]] and [[unlikely]] attributes changes the assembly output by the compiler? And perhaps more importantly, what do these changes do?

I created a simple example for my own understanding to see if there was any difference in the assembly, but it appears that this example is too simple to actually show any changes to the assembly:

void true_path();
void false_path();

void foo(int i) {
    if(i) {
        true_path();
    } else {
        false_path();
    }
}
void bar(int i) {
    if(i) [[likely]] {
        true_path();
    } else [[unlikely]] {
        false_path();
    }
}

View the compiled assembly here.

Hallagan answered 5/6, 2020 at 18:45 Comment(12)

Aside: the attributes are a terrible misnomer, they’re not for optimising a more likely code path. In fact, the branch predictor already does that, no need for attributes or intrinsics. Instead, the attributes are for telling the branch predictor to disregard the more likely branch, and that the performance of the less likely branch is more important. – Berthold 5/6, 2020 at 18:47

I'd recommend watching this video: youtube.com/watch?v=ew3wt0g99kg. – Ictinus 5/6, 2020 at 18:49

I guess that on the x86 platform they refer to the "Branch hint" prefixes 2Eh (Branch not taken (used only with Jcc instructions)) and 3Eh (—Branch taken (used only with Jcc instructions)) from Vol 2A, Chapter 2.1.1 of the current Intel manual. Intel's comment on these: "Some earlier microarchitectures used these as branch hints, but recent generations have not and they are reserved for future hint usage." – Cyanogen 5/6, 2020 at 18:59

@KonradRudolph: "Branch prediction" in the usual sense is a run-time thing. These hints are compile-time hints to the compiler on how it might want to lay out the branches, or decide whether to do if-conversion to something branchless like cmov. It's only overriding anything if the compiler was using profile-guided optimization based on actual runtime data. (And in that case IIRC gcc uses the PGO data, ignoring hints). – Marceline 5/6, 2020 at 19:11

Related: gcc optimization flag -O3 makes code slower than -O2 is a similar thing, where compiler options and/or PGO make a difference between branchless vs. branchy, in a case where branchy is actually better because of data with more of a pattern than GCC would guess by default. – Marceline 5/6, 2020 at 19:14

@PeterCordes: in this simple example, godbolt.org/z/iHWQdH, gcc compiles different code, if you switch likely to unlikely. The funny thing is, if you duplicate that function, and one of them has likely, and the other unlikely, the assembly will be the same for both functions (the attribute of the first function will affect the second function, weird). It's like quantummechanics :) – Meagher 5/6, 2020 at 19:21

@geza: heh, that's fun. GCC's identical-code-folding inter-procedural-analysis optimizations (ipa-icf) happen before generation of final asm, perhaps at a GIMPLE level. I remember finding a bug with different inline asm constraints in different functions being folded together, which was actually a correctness problem. – Marceline 5/6, 2020 at 19:24

@PeterCordes Are you saying that the C++ attributes are different from intrinsics (GCC’s __builtin_expect)? Or are you saying that the latter are also not used to override PGO? Because that was my understanding, based also on the (admittedly, somewhat vague) GCC documentation. – Berthold 5/6, 2020 at 19:27

@KonradRudolph: I assume they're identical, and that [[likely]] / [[unlikely]] were added to ISO C++ to expose that existing compiler behaviour in a portable way. And yes, I'd guess that GCC probably ignores either way of writing it when PGO data is available. – Marceline 5/6, 2020 at 19:31

@KonradRudolph: How can a branch predictor disregard a branch? What do you mean by this? How can the performance of the less likely branch be more important? – Meagher 5/6, 2020 at 19:33

@Meagher … disregard the frequency of taking a branch for the purpose of prefetch. But apparently I was wrong. — As for your second question, this is by all accounts pretty rare; I have never knowingly encountered such a situation. My point is that in the opposite case, where you want the likely branch to be faster, the branch predictor already does this for you, and a [[likely]] annotation is going to make virtually no difference (or it might actually make performance worse). – Berthold 5/6, 2020 at 21:35

@rustyx: yes, but most ISAs don't have static branch hints in asm. On an ISA that did have them, you'd probably expect [[likely]] to compile to an asm hint, like for Pentium 4 when Intel experimented with that for x86. And I think MIPS has some branch-likely instructions, and PowerPC has something. But on modern x86, and ARM/AArch64, all you can do at compile time is lay out branches so the fast path is the not-taken one. See Is it possible to tell the branch predictor how likely it is to follow the branch? – Marceline 6/6, 2020 at 4:29

As it seems, there is a bug in gcc. If you have two functions which are the same, besides [[likely]] attributes, gcc folds them incorrectly.

But if you use just one function, and switch between [[likely]]/[[unlikely]], assembly changes.

So, this function:

void bar(int i) {
    if(i) [[unlikely]] {
        true_path();
    } else [[likely]] {
        false_path();
    }
}

compiles to:

bar(int):
        test    edi, edi
        jne     .L4
        jmp     false_path()
.L4:
        jmp     true_path()

And this:

void bar(int i) {
    if(i) [[likely]] {
        true_path();
    } else [[unlikely]] {
        false_path();
    }
}

compiles to:

bar(int):
        test    edi, edi
        je      .L2
        jmp     true_path()
.L2:
        jmp     false_path()

Notice, that the condition has changed: the first version jumps, if i is non-zero, while the second one jumps if i is zero.

This is in agreement with the attributes: gcc generates code, where the conditional jump happens in the unlikely path.

Meagher answered 5/6, 2020 at 20:9 Comment(4)

Do you happen to know why jne is used here in place of je, and what the effect is? – Hallagan 5/6, 2020 at 21:4

@J.AntonioPerez: The compiler tries to optimize, so the CPU doesn't have to take a branch. I suppose that it does it, because a correctly-predicted-and-not-taken jump is cheaper than a correctly-predicted-and-taken branch. Or maybe it does this way, because of cache usage: cold path is put at the end of the function, so the hot path has a continuous path, so cache is a little bit better utilized. But I'm not an expert on this, if you're curious maybe you want to ask this specific question, tagged with assembly (and maybe x86) labels. – Meagher 5/6, 2020 at 21:58

@J.AntonioPerez: There is some information about this already: https://mcmap.net/q/14994/-how-do-the-likely-unlikely-macros-in-the-linux-kernel-work-and-what-is-their-benefit, read the comments under the answer (Peter Cordes and BeeOnRope's comments in particular). – Meagher 5/6, 2020 at 22:11

@J.AntonioPerez: Both the factors geza cited are advantages for making the expected fast-path the fall-through. With larger blocks of code, letting one line of I-cache maybe stay cold is more plausible. Also note that traditionally, (before IT-TAGE branch predictors), branches where a prediction couldn't be found were statically predicted as not-taken for forward jumps, taken for backward. (IT-TAGE (Haswell and later) always indexes a dynamic prediction so there is no static prediction. The indexed prediction might be mostly primed for a different branch, but that's the tradeoff with TAGE) – Marceline 6/6, 2020 at 1:3

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags