How can I guarantee that a variable will never be zero without using a conditional statement in C?

Asked 27/10, 2022 at 5:51 Answered 29/10, 2022 at 9:25

Solved c compiler-optimization micro-optimization

For example, Let's say a variable x, x could be anything include 0.
Then we got code like:

if(x==0) {
    y = 1;
}
else {
    y = x;
}

Could I do this without producing branches in C/C++?

I'm trying to optimize a piece of code. I want to remove branches as much as possible. There are similar judgments, so I want to convert them into statements without branches to make the code as efficient as possible.

Maggs answered 27/10, 2022 at 5:51 Comment(8)

This kind of optimisation is usually attempted in often executed code parts, e.g. within nested loops. You might achieve enough, if you do the checking as far outside of nesting as possible. I.e. work on the x long before you get to the shown code. – Intertype 27/10, 2022 at 5:52

Otherwise please consider rephrasing and elaborating your problem, considerung whether this might be a meta.stackexchange.com/questions/66377/what-is-the-xy-problem Otherwise the answer is probably basically "Not.". – Intertype 27/10, 2022 at 5:54

If you were using C++ then I’d suggest using a refinement-type to represent state-invariants like your x != 0 guard. I dunno about C though, sorry. – Ixtle 27/10, 2022 at 5:55

Agree with the comments above. Having said that, you can try the following "trick": y = !x + x;. It will be 1 if x==0 and x otherwise. – Consortium 27/10, 2022 at 6:1

A smart compiler is very likely to compile your original code with no branches, so no fancy tweaking is needed. For instance, gcc 12.2: look ma, no jumps. – Mardellmarden 27/10, 2022 at 6:3

@Consortium Very interesting. Do you know about any assumptions (e.g. on compiler behaviour) in that? That would make a good answer. – Intertype 27/10, 2022 at 6:16

@Intertype added as an answer. – Consortium 27/10, 2022 at 6:23

Performance optimization while looking at your code through a drinking straw like this is not likely to be successful. – Maure 30/10, 2022 at 4:18

Some general notes:

As mentioned in other comments, some compilers can optimize and eliminate the branch. You can check the assembly output (e.g. in Godbolt) to make sure.
Beware of premature optimizations.
Always measure and make sure your speculation about what's taking up time is correct.

Having said that you can try the following "trick":

y = !x + x;

Assuming x,y are integer types:

If x==0, !x will be 1 and y will be assigned the value 1.
If x!=0, !x will be 0 and y will be assigned the value x.

Note: see @CostantinoGrana's comment below about the guarantee in the standard. You can also verify it in your specific environment (compiler etc.).

Consortium answered 27/10, 2022 at 6:23 Comment(8)

Nice. I wonder whether any compiler might represent !0 differently than 1. I could imagine 0xffff (16bit example). If you could dig up a standard quote which does or even does not guarantee... – Intertype 27/10, 2022 at 6:31

@Intertype 6.5.3.3 Unary arithmetic operators: 5 The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E). – Rennold 27/10, 2022 at 6:42

@CostantinoGrana That seems to be the perfect input. – Intertype 27/10, 2022 at 6:45

Thanks! It worked, but finally I found the performence get worse after this operation. – Maggs 27/10, 2022 at 9:17

I decide still use the branch way – Maggs 27/10, 2022 at 9:18

@Maggs If one branch is more likely (e.g. x==0 is rare) then it's common to find no gain or even losses in performance. – Bergamot 27/10, 2022 at 9:45

How abouty = (!x)|x;? – Dorella 28/10, 2022 at 20:33

@user16217248 it can work too. For unsigned for sure. Not sure about signed bitwise operations on all compilers. – Consortium 29/10, 2022 at 3:54

You should check the assembler output of your compiler. For example,the X86 architecture has an instruction called cmov (https://www.felixcloutier.com/x86/cmovcc) that is designed to handle this kind of thing without branches. The Arm architecture allows pretty much all instructions to be executed depending on CPU flags: https://developer.arm.com/documentation/dui0473/m/condition-codes/conditional-execution-in-arm-state.

Thus, when optimizations are enabled your compiler will likely produce no branches.

Normanormal answered 27/10, 2022 at 6:5 Comment(0)

If you need this non-zero guarantee for an input to __builtin_clz (which gives an undefined result for an input of zero, otherwise counts leading-zero bits), a useful trick is y = x|1. That doesn't affect the number of leading zeros for any non-zero input, because you'd setting the last position CLZ will look at, only if all the higher bits are zero. If x=1, then x|1 is still 1.

Similarly as setup for __builtin_ctz (count trailing zeros), y = x | (1UL<<31) will set the MSB. Or more portably, x | ( 1ULL << (sizeof(x)*CHAR_BIT-1) ).

Unconditionally setting one of the bits will change some non-zero values; this is only useful in certain cases.

To get exactly the semantics you asked for, y = x ? x : 1 can be expressed that way, or as x + !x as suggested in another answer, which asks the compiler to materialize a boolean integer and add it.

Of course, a smart compiler will hopefully pick a way that's efficient on the target machine. For x86, one might hope that a smart compiler would do something like this if it's ok to overwrite x:

    cmp  eax, 1          ; CF=1 only if EAX==0 because only 0 is < 1U
    adc  eax, 0          ; x += (eax < 1U)

While for AArch64, tst or cmp and then cinc can conditionally copy-and-increment a value depending on any flags conditions.

On the Godbolt compiler explorer -

int make_nonzero_booleanize(int x){
    return x + !x;
}

int make_nonzero_ternary(int x){
    return x ? x : 1;
}

int make_nonzero_if(int x){
  // the original.  Compilers may compile it to branchless code
  // especially gcc -O3 instead of -O2 is more aggressive about if-conversion
    int y;
    if (x)
        y = x;
    else
        y = 1;
    return y;
}

# GCC12 -O3 for x86-64
make_nonzero_booleanize(int):           # return x + !x
        cmp     edi, 1
        mov     eax, edi
        adc     eax, 0       # Apparently a GCC dev already thought of this peephole and taught GCC10 about it
           # I was expecting compilers to naively do xor-zeroing + test/setcc + add
        ret

make_nonzero_ternary(int):
        mov     eax, edi
        test    edi, edi
        mov     edx, 1
        cmove   eax, edx     # standard ternary using conditional-move
        ret

make_nonzero_if(int):
         # same asm as ternary; GCC did an if-conversion optimization

// ARM64 gcc12 -O2
make_nonzero_booleanize(int):
        cmp     w0, 0
        cinc    w0, w0, eq       // return x==0 ? x : x+1
        ret
make_nonzero_ternary(int):
        cmp     w0, 0
        csinc   w0, w0, wzr, ne  // return x==0 ? x : 0+1
        ret
make_nonzero_if(int):
     // identical to ternary, if-conversion even with -O2

Both ways are equally good; the csinc result still has a data dependency on the input w0 even if the condition is true and the result comes from incrementing the zero register (wzr).

# RISC-V 32-bit clang 15 -O3
make_nonzero_booleanize(int):
        seqz    a1, a0                # tmp = (x==0)
        add     a0, a0, a1            # return x + tmp
        ret

make_nonzero_ternary(int):
        mv      a1, a0                  # tmp = x
        li      a0, 1                   # retval = 1
        beqz    a1, .LBB1_2             # if (x != 0) {
        mv      a0, a1                    # retval = tmp
.LBB1_2:                                # }
        ret                             # return retval

make_nonzero_if(int):
      # identical to ternary, same branching.

Clang didn't even do a good job with the branching strategy it picked here; it could have done a bnez over a retval=1, avoiding both mv instructions. Unless there's some tuning thing where a branch over an mv is handled by some RISC-V microarchitectures as a conditional-move? Hopefully after inlining it wouldn't be this bad, and might put a branch target for the ==0 special case somewhere else, where it didn't have to jump over it.

So x + !x is better with some real-world compilers on x86-64 and RISC-V, and equal on AArch64.

If you care about this in some specific context with a specific compiler for some ISA, look at how it compiles in your code. (If you know anything about what's efficient in asm or not.)

Older GCC versions:

GCC7 and earlier does better with the ternary, avoiding the mov eax, edi, instead doing mov eax, 1 and then using cmovnz to copy EDI or not. Hopefully the inefficiency in GCC8 and later only happens when not inlining this tiny function; GCC is known to waste instructions when it has to work around hard register constraints from calling conventions or inline asm.

GCC9 and earlier doesn't know the cmp/adc trick for x86-64, instead using a literal implementation of x + !x, which is unfortunately not great because x86 sucks at materializing a boolean condition as a 0/1 integer due to the poor design of setcc.

# GCC9 and earlier  -O3
make_nonzero_booleanize(int):
        xor     eax, eax       # zero the full reg
        test    edi, edi       # set FLAGS according to x
        sete    al             # EAX = !x
        add     eax, edi       # EAX += x
        ret

To be fair, adc eax, 0 costs 2 uops on Intel before Sandybridge. (And GCC tuning choices maybe didn't realize that adc with immediate 0 was special cased; in general adc costs 2 uops on Intel before Broadwell.)

Branchless might be slower if `x == 0` is very rare

Even in the best case, the branchless code for x86-64, AArch64, and RISC-V is lengthening the critical path dependency chain by 2 instructions, both with 1 cycle latency on most CPUs. (For example, csinc can't run until the flags result of cmp is ready.)

But a correctly predicted branch is a control dependency, so it's handled separately by the CPU, optimistically assuming that y=x will be the case, only bailing out if it later detects a mispredict.

But writing an if() in the C source doesn't guarantee you'll get branchy asm. (That or an || makes it more likely to compile branchy than other ways of writing it, though.)

gcc optimization flag -O3 makes code slower than -O2 - a case where GCC's branchless code-gen for x86-64 was slower than branchy, when the branch is highly predictable. (Especially because GCC made the loop-carried dependency chain longer than necessary.) -O3 is more aggressive about if-conversion to branchless, but it didn't auto-vectorize.
Why is a conditional move not vulnerable to Branch Prediction Failure?
http://yarchive.net/comp/linux/cmov.html - Linus Torvalds on cmov, back in the Pentium 4 days when it was slower.

Lancer answered 27/10, 2022 at 16:34 Comment(0)

Despite the fact that the question has been answered, you can do something like this: x==0?!x: x, However, the condition is still present. Although it's not the best, it will at least guarantee that x won't be zero in any case.

Rebirth answered 29/10, 2022 at 9:25 Comment(0)

-2

Maybe you can do this just like you are doing it in perl: (void)((y = x) || (y = x + 1));

Vacate answered 27/10, 2022 at 6:38 Comment(3)

Maybe you can turn this into a more assertive answer. Try for How to Answer. You could try and base your answer on a test result. Note that "Try whether <code> helps." is more a comment than an answer. Compare meta.stackexchange.com/questions/214173/… – Intertype 27/10, 2022 at 6:41

|| is specified as short-circuiting, thus branching on y=x being 0. In terms of hinting the compiler towards branchless code, it's not better. In practice it compiles the same as the if in the question. godbolt.org/z/jvjv4Er63 It's also not very readable or idiomatic for C. – Lancer 27/10, 2022 at 20:17

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Cohobate 31/10, 2022 at 10:5

-2

well you could always do (x * x) + 1 a number squared can never be negative (unless imaginary) and if both equal zero the result would be 1

Frowsy answered 27/10, 2022 at 6:45 Comment(1)

x*x can overflow, which is undefined behaviour for signed integers in C. Are you assuming x is a floating-point number, so overflow would wrap to +Infinity? Or that it's unsigned, so x*x could be zero, but will always have its low bit clear? Anyway, if you don't care about keeping x the same for non-zero x, it's much more efficient to do x|1 or something. – Lancer 27/10, 2022 at 16:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Older GCC versions:

Branchless might be slower if x == 0 is very rare

Recommended topics

Hot tags

Branchless might be slower if `x == 0` is very rare