So someone on a forum asked why this C function (which I added const
and restrict
to, just in case):
void foo(int *const restrict dest, const int *const restrict source) {
*dest = (*source != -1) ? *source : *dest;
}
generates this Assembly, both with the latest gcc and clang compiler, and using -Os
to optimize for size on godbolt:
foo:
mov eax, dword ptr [rsi]
cmp eax, -1
jne .LBB0_2
mov eax, dword ptr [rdi]
.LBB0_2:
mov dword ptr [rdi], eax
ret
This seemingly identical function however generates one less instruction on godbolt:
void foo(int *const restrict dest, const int *const restrict source) {
if (*source != -1) {
*dest = *source;
}
}
foo:
mov eax, dword ptr [rsi]
cmp eax, -1
je .LBB0_2
mov dword ptr [rdi], eax
.LBB0_2:
ret
Maybe I'm missing something, or maybe I'm underestimating how tricky it is for compilers to optimize this. I would have expected the compiler to detect that *dest = *dest
can't have any side-effects, since I didn't mark it volatile
, and is thus allowed to optimize the ternary to the if-statement version. I've found other Stack Overflow answers to similar questions that often boil it down to GCC having an edge over Clang in one case, and vice versa, but it seems strange that neither seems to optimize this case to me.
else
clauses into the code, something which theif
statement is missing. Oh, and more "compact" code (like the ternary expression) too often makes the code harder to read, understand and maintain. – Syncytiumif
does not. The question is why the compiler does not optimize that. Your unnumbered comment following that is also irrelevant; the question is not about which C code is better; it is about the compiler optimization. – Chemarin?:
comes with a sequence point so it enforces the evaluation of*source != -1
before the rest. It does not however enforce a write to*dest
if the compiler can deduct that it is a no-op. Adding anelse { *dest = *dest; }
to theif
version adds nothing, both gcc and clang optimizes that one out. Andrestrict
doesn't change anything here, though in theory without itdest
andsource
could alias. – Tarp(*source != -1 || (*dest = *dest,0)) && (*dest = *source);
also generates the same identical assembly as theif
version. Here*dest = *dest
gets optimized out too - the compiler dropped the whole|| (*dest = *dest,0)
part. – Tarp?:
code and both just happened to sport that missed optimization at the same time. – Tarp(*source != -1) ? (*dest = *source) : (*dest = *dest);
gets optimized. – Scholemdest
is pointing to a peripheral register having side-effects on access. That is in the first snippet you are forcing it to be read and written even in case the condition is true, while not in the second snippet. Sure, in such a case you would usually make itvolatile
, but still I guess the compiler is allowed to assume reads and writes have side-effects. – Photophobia*dest
if*source == -1
, while in version 1, it has the freedom to do so if it wishes. For instance, when*source == -1
, version 2 can legally be called withdest
a null or invalid pointer, or pointing to an object that's being concurrently accessed in another thread. But that freedom ought to allow it to emit better code for version 1 if anything. Maybe it gets confused by the extra freedom. – Brickeycmovcc
to select the value to be stored. Neither gcc or clang actually does so at any optimization level, but maybe the possibility of doing so gets it started down an ultimately sub-optimal path. – Brickey*dest
simultaneously (which is why it could be done branchlessly with an unconditional store as @NateEldredge points out). But theif
version wouldn't if the condition was false: it doesn't even read*dest
in that case. – Inboard*foo = *foo
if they decide to branch, at least not because of programs that rely on UB. They can't invent stores when the abstract machine doesn't have any, though, so they can't if-convertif (cond) *dst = 1;
into branchless*dst = cond ? 1 : *dst;
, but the reverse transformation is legal. Writing the source to always assign something can help compilers auto-vectorize (more efficiently or at all) with a load / ALU blend / store, where anif
might need a masked store and even masked load. – Inboard