I stumbled upon this SO question because I was asking me the same… turns out it’s not always. For instance, the following code…
const struct op {
const char *foo;
int bar;
int flags;
} ops[] = {
{ "foo", 5, 16 },
{ "bar", 9, 16 },
{ "baz", 13, 0 },
{ 0, 0, 0 }
};
extern int foo(const struct op *, int);
int
bar(void *a, void *b, int c, const struct op *d)
{
c |= (a == b) && (d->flags & 16);
return foo(d, c) + 1;
}
… compiles to branching code with both gcc 3.4.6 (i386) and 8.3.0 (amd64, i386) in all optimisation levels. The one from 3.4.6 is more manually legibe, I’ll demonstrate with gcc -O2 -S -masm=intel x.c; less x.s
:
[…]
.text
.p2align 2,,3
.globl bar
.type bar , @function
bar:
push %ebp
mov %ebp, %esp
push %ebx
push %eax
mov %eax, DWORD PTR [%ebp+12]
xor %ecx, %ecx
cmp DWORD PTR [%ebp+8], %eax
mov %edx, DWORD PTR [%ebp+16]
mov %ebx, DWORD PTR [%ebp+20]
je .L4
.L2:
sub %esp, 8
or %edx, %ecx
push %edx
push %ebx
call foo
inc %eax
mov %ebx, DWORD PTR [%ebp-4]
leave
ret
.p2align 2,,3
.L4:
test BYTE PTR [%ebx+8], 16
je .L2
mov %cl, 1
jmp .L2
.size bar , . - bar
Turns out the pointer comparison operation invokes a comparison and even a subroutine to insert 1.
Not even using !!(a == b)
makes a difference here.
tl;dr
Check the actual compiler output (assembly with -S
or disassembly with objdump -d -Mintel x.o
; drop the -Mintel
if not on x86, it merely makes the assembly more legible) of the actual compilation; compilers are unpredictable beasts.
const T v[] = {b, a}; return v[a>b]
, which, depending on the predictability of the code, may run even faster. However, only do this kind of microoptimization if you have measured relevance. – Kitty