Why does this ternary generate more Assembly than an equivalent if?

Asked 14/6, 2024 at 13:3 Answered 15/6, 2024 at 10:4

c assembly x86-64 compiler-optimization conditional-operator

So someone on a forum asked why this C function (which I added const and restrict to, just in case):

void foo(int *const restrict dest, const int *const restrict source) {
    *dest = (*source != -1) ? *source : *dest;
}

generates this Assembly, both with the latest gcc and clang compiler, and using -Os to optimize for size on godbolt:

foo:
        mov     eax, dword ptr [rsi]
        cmp     eax, -1
        jne     .LBB0_2
        mov     eax, dword ptr [rdi]
.LBB0_2:
        mov     dword ptr [rdi], eax
        ret

This seemingly identical function however generates one less instruction on godbolt:

void foo(int *const restrict dest, const int *const restrict source) {
    if (*source != -1) {
        *dest = *source;
    }
}

foo:
        mov     eax, dword ptr [rsi]
        cmp     eax, -1
        je      .LBB0_2
        mov     dword ptr [rdi], eax
.LBB0_2:
        ret

Maybe I'm missing something, or maybe I'm underestimating how tricky it is for compilers to optimize this. I would have expected the compiler to detect that *dest = *dest can't have any side-effects, since I didn't mark it volatile, and is thus allowed to optimize the ternary to the if-statement version. I've found other Stack Overflow answers to similar questions that often boil it down to GCC having an edge over Clang in one case, and vice versa, but it seems strange that neither seems to optimize this case to me.

Lange answered 14/6, 2024 at 13:3 Comment(19)

Looks like a missed optimisation to me. – Aphis 14/6, 2024 at 13:11

There are a few things you're missing: 1) More "compact" C code doesn't always translate into less assembly; 2) Less assembly doesn't mean faster or even smaller binary code, the x86-family instruction size could vary wildly even for instructions that seems similar; And 3) With the ternary expression you force an else clauses into the code, something which the if statement is missing. Oh, and more "compact" code (like the ternary expression) too often makes the code harder to read, understand and maintain. – Syncytium 14/6, 2024 at 13:13

@Aphis Both gcc and clang behave identically though. Interesting question :) – Tarp 14/6, 2024 at 13:15

@Someprogrammerdude I appreciate you pointing it out for others, but I had considered all of those before writing my question, fwiw :) – Lange 14/6, 2024 at 13:15

@Someprogrammerdude: 1 is irrelevant to this question. Regarding 2, the longer assembly is not just different; it is a superset of the shorter code except for a different condition on a branch instruction, so it is clearly longer binary, and optimization for small code was requested. Regarding 3, yes, the conditional operator has an “else” case the if does not. The question is why the compiler does not optimize that. Your unnumbered comment following that is also irrelevant; the question is not about which C code is better; it is about the compiler optimization. – Chemarin 14/6, 2024 at 13:28

Some observations: ?: comes with a sequence point so it enforces the evaluation of *source != -1 before the rest. It does not however enforce a write to *dest if the compiler can deduct that it is a no-op. Adding an else { *dest = *dest; } to the if version adds nothing, both gcc and clang optimizes that one out. And restrict doesn't change anything here, though in theory without it dest and source could alias. – Tarp 14/6, 2024 at 13:31

I'd hazard a guess that a lot more effort has been expended on optimising the ordinary form of if ( x ) A; else B; and that the cryptic ternary form is supported grudgingly as a relic. It has generated a literal translation of your ternary statement and your if statement. What happens if you explicitly add else *dest = *dest ; to it?Sometimes peephole optimisers miss that a value is already loaded in a register. – Childlike 14/6, 2024 at 13:37

(*source != -1 || (*dest = *dest,0)) && (*dest = *source); also generates the same identical assembly as the if version. Here *dest = *dest gets optimized out too - the compiler dropped the whole || (*dest = *dest,0) part. – Tarp 14/6, 2024 at 13:39

I guess the boring explanation is that both gcc and clang failed to optimize the ?: code and both just happened to sport that missed optimization at the same time. – Tarp 14/6, 2024 at 13:41

I guess an assignment with a complicated r.h.s. is harder for the optimizer to recognize than one with a simple r.h.s. – Scholem 14/6, 2024 at 13:59

(*source != -1) ? (*dest = *source) : (*dest = *dest); gets optimized. – Scholem 14/6, 2024 at 14:7

Note, that the snippets aren't equivalent if dest is pointing to a peripheral register having side-effects on access. That is in the first snippet you are forcing it to be read and written even in case the condition is true, while not in the second snippet. Sure, in such a case you would usually make it volatile, but still I guess the compiler is allowed to assume reads and writes have side-effects. – Photophobia 14/6, 2024 at 14:13

@EugeneSh.: If the compiler assumed reads and writes have side effects, it would destroy optimization of expression evaluation throughout programs. – Chemarin 14/6, 2024 at 14:18

@EricPostpischil I am not saying it is always assuming them, I am saying that it is not required to not assume them. Sure, if we say that it must consider non-volatile pointers to always point to a "normal" memory, then the snippets are equivalent and it is just a missed optimization. – Photophobia 14/6, 2024 at 14:21

One way in which the snippets certainly are inequivalent is that in version 2, the compiler must not load or store to *dest if *source == -1, while in version 1, it has the freedom to do so if it wishes. For instance, when *source == -1, version 2 can legally be called with dest a null or invalid pointer, or pointing to an object that's being concurrently accessed in another thread. But that freedom ought to allow it to emit better code for version 1 if anything. Maybe it gets confused by the extra freedom. – Brickey 14/6, 2024 at 14:45

For example, version 1 could in principle be emitted as branchless code, doing the load and store unconditionally and using cmovcc to select the value to be stored. Neither gcc or clang actually does so at any optimization level, but maybe the possibility of doing so gets it started down an ultimately sub-optimal path. – Brickey 14/6, 2024 at 14:50

The ternary version would have data-race UB if another thread was writing *dest simultaneously (which is why it could be done branchlessly with an unconditional store as @NateEldredge points out). But the if version wouldn't if the condition was false: it doesn't even read *dest in that case. – Inboard 15/6, 2024 at 16:4

@PeterCordes Do you know if there are IRL cases where compilers deliberately generate my question's poorer ternary output, solely to make some scuffed programs that rely on using data-race UB to still reload dest its value? I'd hope not, but since you brought up the UB case. Btw, I'm the #1 fan of your in-depth explanations, so thanks for commenting. :) – Lange 15/6, 2024 at 22:15

@MyNameIsTrez: I don't think GCC or clang intentionally avoid optimizing away *foo = *foo if they decide to branch, at least not because of programs that rely on UB. They can't invent stores when the abstract machine doesn't have any, though, so they can't if-convert if (cond) *dst = 1; into branchless *dst = cond ? 1 : *dst;, but the reverse transformation is legal. Writing the source to always assign something can help compilers auto-vectorize (more efficiently or at all) with a load / ALU blend / store, where an if might need a masked store and even masked load. – Inboard 16/6, 2024 at 1:16

-1

Code snippet 1:

void foo(int *const restrict dest, const int *const restrict source) {
    *dest = (*source != -1) ? *source : *dest;
}

Code snippet 2:

void foo(int *const restrict dest, const int *const restrict source) {
    if (*source != -1) {
        *dest = *source;
    }
}

The code snippet 1 and code snippet 2 don't actually match logically and that's why the assembly code differ.

What has been proposed in code snippet 1 is,

' if *source != -1 then *dest = *source;

else *dest = *dest; '
What has been proposed in code snippet 2 is,

' if *source != -1 then *dest = *source; '

Inorder to match the logic the code snippet 2 can be changed to the following:

void foo(int *const restrict dest, const int *const restrict source) {
    if (*source != -1) {
        *dest = *source;
    }
    else{
        *dest = *dest; // usesless instruction
    }
}

Usage of Ternary over if-else condition: https://www.tutorialspoint.com/ternary-operator-vs-if-else-in-c-cplusplus

Coriss answered 15/6, 2024 at 10:4 Comment(3)

The code is nominally different at the layer of the abstract machine model used by the C standard, but it is not different at the layer of observable behavior. A compiler is permitted to optimize them to the same code, and the question is why they do not. This post does not answer that question. – Chemarin 15/6, 2024 at 10:37

@EricPostpischil: The main difference is that if another thread is reading or writing *dest at the same time, version 2 has data-race UB only if the condition is true. The ternary version always conflicts with another thread touching *data. If you mean "observable in a data-race-free program", then yeah they're equivalent, but not exactly in terms of what freedom that gives the optimizer. Anyway, agreed this answer doesn't really address any of that. – Inboard 15/6, 2024 at 16:7

@PeterCordes: Yes but they do not differ in any cases defined for both, so they can be optimized to the same code. – Chemarin 15/6, 2024 at 16:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags