Branchless code that maps zero, negative, and positive to 0, 1, 2
Asked Answered
S

12

14

Write a branchless function that returns 0, 1, or 2 if the difference between two signed integers is zero, negative, or positive.

Here's a version with branching:

int Compare(int x, int y)
{
    int diff = x - y;
    if (diff == 0)
        return 0;
    else if (diff < 0)
        return 1;
    else
        return 2;
}

Here's a version that may be faster depending on compiler and processor:

int Compare(int x, int y)
{
    int diff = x - y;
    return diff == 0 ? 0 : (diff < 0 ? 1 : 2);
}

Can you come up with a faster one without branches?

SUMMARY

The 10 solutions I benchmarked had similar performance. The actual numbers and winner varied depending on compiler (icc/gcc), compiler options (e.g., -O3, -march=nocona, -fast, -xHost), and machine. Canon's solution performed well in many benchmark runs, but again the performance advantage was slight. I was surprised that in some cases some solutions were slower than the naive solution with branches.

Sanatory answered 23/10, 2009 at 0:38 Comment(6)
While it's always hard to tell the difference, this sounds like Computer Hardware/Machine Architecture homework to me.Crassus
Using the ternary "?:" operator contains an implicit branch.Hymnology
I don't think he implied otherwise.Diomedes
Have you actually tried the most straightforward way to compare your Point structs: bool operator<(const Point& a, const Point& b){ return a.x < b.x || (a.x == b.x && a.y < b.y); }? I can't see how the look-up table is going to beat that because calculating the array indices appears to be at least as expensive as just calculating the answer. - With a quick test this sorts 1 million random pairs some 30% faster than the look-up table and Tom's Compare function.Cabrera
I agree. I wouldn't use a table for Point. However, it might make sense when comparing two Lines, which requires comparing 4 integers.Sanatory
Related: Is there a standard sign function in C?Sumer
D
14
int Compare(int x, int y) {
     return (x < y) + (y < x) << 1;
}

Edit: Bitwise only? Guess < and > don't count, then?

int Compare(int x, int y) {
    int diff = x - y;
    return (!!diff) | (!!(diff & 0x80000000) << 1);
}

But there's that pesky -.

Edit: Shift the other way around.

Meh, just to try again:

int Compare(int x, int y) {
    int diff = y - x;
    return (!!diff) << ((diff >> 31) & 1);
}

But I'm guessing there's no standard ASM instruction for !!. Also, the << can be replaced with +, depending on which is faster...

Bit twiddling is fun!

Hmm, I just learned about setnz.

I haven't checked the assembler output (but I did test it a bit this time), and with a bit of luck it could save a whole instruction!:

IN THEORY. MY ASSEMBLER IS RUSTY

subl  %edi, %esi
setnz %eax
sarl  $31, %esi
andl  $1, %esi
sarl  %eax, %esi
mov   %esi, %eax
ret

Rambling is fun.

I need sleep.

Diomedes answered 23/10, 2009 at 0:44 Comment(6)
Doesn't work: Compare(5, 9) -> 3 should be 1 Compare(3, 1) -> 1 should be 2 Compare(4, 4) -> 0 (correct) Here's my version: int Compare(int x, int y) { unsigned int diff = x - y; return (!(diff >> 31) & (diff & 0x7FFFFFFF) != 0) << 1 | (diff >> 31); } Can you do better? :)Sanatory
Oops, sorry, I hadn't tested and didn't cross my mind !!diff would be 1 for a negative diff, for some reason.Diomedes
There's an ASM instruction for !! if there's an ASM instruction for != 0 (which I bet there is).Tarantass
Why bitwise only? Generating a bool value from a comparison is a branchless operation these days.Synopsis
+1 for the setnz trick (but I can't add twice...). Wonder whether it's an optimized instruction on recent x86 processors though.Diplostemonous
@AndreyT: Beats me, I'm only sticking to OP's rules.Diomedes
S
24

Branchless (at the language level) code that maps negative to -1, zero to 0 and positive to +1 looks as follows

int c = (n > 0) - (n < 0);

if you need a different mapping you can simply use an explicit map to remap it

const int MAP[] = { 1, 0, 2 };
int c = MAP[(n > 0) - (n < 0) + 1];

or, for the requested mapping, use some numerical trick like

int c = 2 * (n > 0) + (n < 0);

(It is obviously very easy to generate any mapping from this as long as 0 is mapped to 0. And the code is quite readable. If 0 is mapped to something else, it becomes more tricky and less readable.)

As an additinal note: comparing two integers by subtracting one from another at C language level is a flawed technique, since it is generally prone to overflow. The beauty of the above methods is that they can immedately be used for "subtractionless" comparisons, like

int c = 2 * (x > y) + (x < y);
Synopsis answered 23/10, 2009 at 6:16 Comment(17)
I like the sheer conciseness and readability of what you've done here. The only flaw I see is that not all languages or environments will return consistent values (or numeric values) from a comparison (i.e. you can't use this in Java).Casaba
Funny. It was my first answer, and I discarded it on the bitwise argument.Diomedes
Yes, I saw that you submitted the same thing before me. I upvoted your answer and the one from Tom right away. Now I'm trying to downvote mine, but it doesn't let me :)Synopsis
Depending on the instruction set, this isn't necessarily branchless.Fevre
Yes, that's what I meant when I said that it is branchless at the language level.Synopsis
If you are going for performance, that multiplication by 2 could be written with bit shifts, if x and y are integersWoodruff
Any compiler worth anyone's time will do that for you.Renounce
@Tom: Multiplication by 2 and shift 1 bit left are operations that have exactly the same semantical meaning in C/C++. For this reason, they produce identical machine code. There's absolutely no point to replace one with another. It will absolutely have no effect on performance. The choice must be made from pure readability considerations. In this case I wanted to multiply something by 2, not shift anything anywhere, so that's exactly how I spelled it out in the code: with multiplication operator and with a 2.Synopsis
@AudreyT: Even though it's "branchless" at the language level, anytime you use comparison operators, you are potentially introducing branches at the machine compilation level.Anurous
@Adisak: Yes, I understand that perfectly well, which is, obviously, the reason I included the "language level" remark in my original reply. So?Synopsis
AudreyT: "So?" Well, the reason it's important is the Branchless code is almost always harder to read than Branching code... even in the simple case you present with int c = 2 * (x > y) + (x < y); It's mainly used as part of an advanced programmers toolkit as a micro-optimization in heavily used critical routines or inner loops. Since your "Branchless at the language level" example emits two branches at the machine code level, it's no longer a valid optimization. Your example will run slower than a naive easy-to-read branching version.Anurous
Firstly, it doesn't necessarily emit branches. It only might emit branches, and with a more-or-less decent compiler there will only be one branch, not two. Or it might not emit any branches at all. I consider the latter as much more likely, BTW. Secondly, I don't see why it should run "slower" instead of running with the same speed as the explictly branched version.Synopsis
Thirdly, code like cmp = (x > y) - (x < y) is a well-known and extremely widespread comparison idiom. For even a moderately experienced programmer this version is significantly more readable than the explicitly branched version, since it is much more compact.Synopsis
@BЈовић: It is the number that we need to map.Synopsis
But it is not mentioned anywhere. Not in the question, and not in this answer. How is it calculated, and where are x and y from the question?Attain
@BЈовић: The quesion is, as stated in the title, about "mappping zero, negative, and positive [value] to 0, 1, 2". I thing it is sufficently clear that in my answer n is intended to be that value. Also, once you understand the question, you realize that x and y are completely irrlelvant. If you want to use a name from the question, that name would be diff, not x and y. I just replaced diff with n for brevity, since the name of the variable does not really matter. n is basically a name for the "formal paramter" of the idiom I pressented.Synopsis
@BЈовић: It is not clear to me why you mention these irrlevant x and y, yet completely igore the directly relevant diff. It is even less clear to me how you could miss a "comparison" example at the end of my answer, which actually uses x and `ySynopsis
D
14
int Compare(int x, int y) {
     return (x < y) + (y < x) << 1;
}

Edit: Bitwise only? Guess < and > don't count, then?

int Compare(int x, int y) {
    int diff = x - y;
    return (!!diff) | (!!(diff & 0x80000000) << 1);
}

But there's that pesky -.

Edit: Shift the other way around.

Meh, just to try again:

int Compare(int x, int y) {
    int diff = y - x;
    return (!!diff) << ((diff >> 31) & 1);
}

But I'm guessing there's no standard ASM instruction for !!. Also, the << can be replaced with +, depending on which is faster...

Bit twiddling is fun!

Hmm, I just learned about setnz.

I haven't checked the assembler output (but I did test it a bit this time), and with a bit of luck it could save a whole instruction!:

IN THEORY. MY ASSEMBLER IS RUSTY

subl  %edi, %esi
setnz %eax
sarl  $31, %esi
andl  $1, %esi
sarl  %eax, %esi
mov   %esi, %eax
ret

Rambling is fun.

I need sleep.

Diomedes answered 23/10, 2009 at 0:44 Comment(6)
Doesn't work: Compare(5, 9) -> 3 should be 1 Compare(3, 1) -> 1 should be 2 Compare(4, 4) -> 0 (correct) Here's my version: int Compare(int x, int y) { unsigned int diff = x - y; return (!(diff >> 31) & (diff & 0x7FFFFFFF) != 0) << 1 | (diff >> 31); } Can you do better? :)Sanatory
Oops, sorry, I hadn't tested and didn't cross my mind !!diff would be 1 for a negative diff, for some reason.Diomedes
There's an ASM instruction for !! if there's an ASM instruction for != 0 (which I bet there is).Tarantass
Why bitwise only? Generating a bool value from a comparison is a branchless operation these days.Synopsis
+1 for the setnz trick (but I can't add twice...). Wonder whether it's an optimized instruction on recent x86 processors though.Diplostemonous
@AndreyT: Beats me, I'm only sticking to OP's rules.Diomedes
B
10

Assuming 2s complement, arithmetic right shift, and no overflow in the subtraction:

#define SHIFT (CHARBIT*sizeof(int) - 1)

int Compare(int x, int y)
{
    int diff = x - y;
    return -(diff >> SHIFT) - (((-diff) >> SHIFT) << 1);
}
Biosphere answered 23/10, 2009 at 0:49 Comment(5)
(2 & (-diff >> 30)) looks promising, but doesn't work. I.e., if diff=-4, the outcome would be 0, though 1 for negative is expected.Diplostemonous
Tordek means that (2 & (-diff >> 30)) is equivalent to -(((-diff) >> 31) << 1), not that it replaces the entire expression.Biosphere
You're also assuming sizeof(int) == 4.Gordon
#include <limits.h> and replace 31 with CHAR_BIT * sizeof(int) - 1Occupant
If the explicit extraction of sign bit is the most efifcient way to check the sign of a number (as opposed to test-like operations and CPU state flags), then that's the code any self-respecting compiler will generate for language-level comparisons with 0.Synopsis
S
9

Two's complement:

#include <limits.h>
#define INT_BITS (CHAR_BITS * sizeof (int))

int Compare(int x, int y) {
    int d = y - x;
    int p = (d + INT_MAX) >> (INT_BITS - 1);
    d = d >> (INT_BITS - 2);
    return (d & 2) + (p & 1);
}

Assuming a sane compiler, this will not invoke the comparison hardware of your system, nor is it using a comparison in the language. To verify: if x == y then d and p will clearly be 0 so the final result will be zero. If (x - y) > 0 then ((x - y) + INT_MAX) will set the high bit of the integer otherwise it will be unset. So p will have its lowest bit set if and only if (x - y) > 0. If (x - y) < 0 then its high bit will be set and d will set its second to lowest bit.

Spannew answered 23/10, 2009 at 1:10 Comment(4)
The Bit Twiddling Hacks Page guy (a page which I keep referring people to)? Wow!Teak
Who me? Glad to be of service. Actually the only reason I am here is because I need the sense of affirmation for getting all the badges. :)Spannew
Bug? Cmp(5,9) should be 1 and Cmp(9,5) should be 2Sanatory
Right so change the difference line to: int d = y - x;Spannew
A
6

Unsigned Comparison that returns -1,0,1 (cmpu) is one of the cases that is tested for by the GNU SuperOptimizer.

cmpu: compare (unsigned)
int cmpu(unsigned_word v0, unsigned_word v1)
{
    return ( (v0 > v1) ? 1 : ( (v0 < v1) ? -1 : 0) );
}

A SuperOptimizer exhaustively searches the instruction space for the best possible combination of instructions that will implement a given function. It is suggested that compilers automagically replace the functions above by their superoptimized versions (although not all compilers do this). For example, in the PowerPC Compiler Writer's Guide (powerpc-cwg.pdf), the cmpu function is shown as this in Appendix D pg 204:

cmpu: compare (unsigned)
PowerPC SuperOptimized Version
subf  R5,R4,R3
subfc R6,R3,R4
subfe R7,R4,R3
subfe R8,R7,R5

That's pretty good isn't it... just four subtracts (and with carry and/or extended versions). Not to mention it is genuinely branchfree at the machine opcode level. There is probably a PC / Intel X86 equivalent sequence that is similarly short since the GNU Superoptimizer runs for X86 as well as PowerPC.

Note that Unsigned Comparison (cmpu) can be turned into Signed Comparison (cmps) on a 32-bit compare by adding 0x80000000 to both Signed inputs before passing it to cmpu.

cmps: compare (signed)
int cmps(signed_word v0, signed_word v1)
{
    signed_word offset=0x80000000;
    return ( (unsigned_word) (v0 + signed_word),
        (unsigned_word) (v1 + signed_word) );
}

This is just one option though... the SuperOptimizer may find a cmps that is shorter and does not have to add offsets and call cmpu.

To get the version that you requested that returns your values of {1,0,2} rather than {-1,0,1} use the following code which takes advantage of the SuperOptimized cmps function.

int Compare(int x, int y)
{
    static const int retvals[]={1,0,2};
    return (retvals[cmps(x,y)+1]);
}
Anurous answered 26/10, 2009 at 6:52 Comment(0)
W
4

I'm siding with Tordek's original answer:

int compare(int x, int y) {
    return (x < y) + 2*(y < x);
}

Compiling with gcc -O3 -march=pentium4 results in branch-free code that uses conditional instructions setg and setl (see this explanation of x86 instructions).

push   %ebp
mov    %esp,%ebp
mov    %eax,%ecx
xor    %eax,%eax
cmp    %edx,%ecx
setg   %al
add    %eax,%eax
cmp    %edx,%ecx
setl   %dl
movzbl %dl,%edx
add    %edx,%eax
pop    %ebp
ret 
Woodruff answered 23/10, 2009 at 1:44 Comment(1)
As I said in other comment, it is a pity that it still dosn't understand that a single cmp %edx,%ecx is sufficient.Synopsis
D
4

Good god, this has haunted me.

Whatever, I think I squeezed out a last drop of performance:

int compare(int a, int b) {
    return (a != b) << (a > b);
}

Although, compiling with -O3 in GCC will give (bear with me I'm doing it from memory)

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
cmpl  %esi, %edi
setgt %dl
sall  %dl, %eax
ret

But the second comparison seems (according to a tiny bit of testing; I suck at ASM) to be redundant, leaving the small and beautiful

xorl  %eax, %eax
cmpl  %esi, %edi
setne %al
setgt %dl
sall  %dl, %eax
ret

(Sall may totally not be an ASM instruction, but I don't remember exactly)

So... if you don't mind running your benchmark once more, I'd love to hear the results (mine gave a 3% improvement, but it may be wrong).

Diomedes answered 26/10, 2009 at 3:23 Comment(3)
Why wouldn't you just add al and dl to get the value of 1 or 2 instead of using SALL (shift arithmetic left long)? Add is simpler and faster than variable shift (which may be microcoded on some CPUs).Anurous
It's a fair option; I hadn't though of it. Only benchmarks will tell.Diomedes
ICC 11 produces (icc -static -g -O3 -fast -xHost): `[mov $0x1,%edx; xor %ecx,%ecx; xor %eax,%eax; cmp %esi,%edi; cmovne %edx,%eax; cmovg %edx,%ecx; shl %cl,%eax; retq] My benchmark indicates it is very fast.Sanatory
S
0

Combining Stephen Canon and Tordek's answers:

int Compare(int x, int y)
{
    int diff = x - y; 
    return -(diff >> 31) + (2 & (-diff >> 30));
} 

Yields: (g++ -O3)

subl     %esi,%edi 
movl     %edi,%eax
sarl     $31,%edi
negl     %eax
sarl     $30,%eax
andl     $2,%eax
subl     %edi,%eax 
ret 

Tight! However, Paul Hsieh's version has even fewer instructions:

subl     %esi,%edi
leal     0x7fffffff(%rdi),%eax
sarl     $30,%edi
andl     $2,%edi
shrl     $31,%eax
leal     (%rdi,%rax,1),%eax
ret
Sanatory answered 23/10, 2009 at 6:5 Comment(2)
Note that fewer instructions != faster. Only profiling the examples being run a few million times will tell.Tarantass
I'd say that the 2 * (x >y) + (x < y) version looks cleaner in C code, produces few instructions and embeds no immediate constants in machine code. Unfortunately, neither GCC nor MSVC yet understand that it is enough to cmp x and y only once, even at the highest optimization levels.Synopsis
L
0
int Compare(int x, int y)
{
    int diff = x - y;

    int absdiff = 0x7fffffff & diff; // diff with sign bit 0
    int absdiff_not_zero = (int) (0 != udiff);

    return
        (absdiff_not_zero << 1)      // 2 iff abs(diff) > 0
        -
        ((0x80000000 & diff) >> 31); // 1 iff diff < 0
}
Landel answered 23/10, 2009 at 14:30 Comment(1)
This will not work correctly in cases where the computation of diff = x-y overflows. Also, some compilers will convert the != you use to compute absdiff_not_zero to a branch.Anurous
T
0

For 32 signed integers (like in Java), try:

return 2 - ((((x >> 30) & 2) + (((x-1) >> 30) & 2))) >> 1;

where (x >> 30) & 2 returns 2 for negative numbers and 0 otherwise.

x would be the difference of the two input integers

Tetherball answered 1/4, 2012 at 20:49 Comment(0)
H
0

The basic C answer is :

int v;           // find the absolute value of v
unsigned int r;  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Also :

r = (v ^ mask) - mask;

Value of sizeof(int) is often 4 and CHAR_BIT is often 8.

Hooknosed answered 8/4, 2022 at 23:15 Comment(0)
F
0

rcl r8, 1 has entered the chat... /;)

diffClassifyU64:
        cmp     rdi, rsi
        seta    al
        rcl     al
        ret

diffClassifyU64 is just three x86 assembly instructions to compare two unsigned 64-bit operands and returns 0 if they are equal, 1 if the first is less than the second, and 2 if greater.

I wrote a neat portable wrapper that will enable this bit trick where appropriate, using fallbacks for non-x86 and non-gcc compilers and permitting constant arguments to be evaluated at compile time and work in c++ constexpr if you are using c++ (this is disabled if you are not in c++17.)

  • Use numClassifyI64 below to classify a single signed 64-bit integer.
// returns {0, 1, 2} if SIGNED 64-bit x is {zero, negative, positive}
uint_least8_t numClassifyI64(int64_t x) {
    uint_least8_t gt = 0 < x, nz = x != 0, out = gt + nz;
    #if defined(__GNUC__) && defined(__x86_64__)
        #ifdef __cpp_lib_is_constant_evaluated
        if (!__builtin_is_constant_evaluated())
        #endif
        if (! __builtin_constant_p(x))
            __asm__("sal{q   %1|     %1}\n\t"
                    "setnbe{ %0|  %0}\n\t"
                    "rcl{b    $1, %0|     %0, 1}"
                    : "=r"(out), "+r"(x) :: "cc");
    #endif
    return out;
}
  • Use diffClassifyU64 below to compare two UNSIGNED 64-bit integers
// returns {0, 1, 2} if UNSIGNED 64-bit {a == b, a < b, a > b}
uint_least8_t diffClassifyU64(uint64_t a, uint64_t b) {
    uint_least8_t gt = a > b, nz = a != b, out = gt + nz;
    #if defined(__GNUC__) && defined(__x86_64__)
        #ifdef __cpp_lib_is_constant_evaluated
        if (!__builtin_is_constant_evaluated())
        #endif
        if (! __builtin_constant_p(a-b))
            __asm__("cmp{q   %2, %1|     %1, %2}\n\t"
                    "setnbe{ %0|  %0}\n\t"
                    "rcl{b    $1, %0|     %0, 1}"
                    : "=r"(out) : "r"(a), "r"(b) : "cc");
    #endif
    return out;
}
  • Use diffClassifyI64 below to compare two SIGNED 64-bit integers
// returns {0, 1, 2} if SIGNED 64-bit {a == b, a < b, a > b}
uint_least8_t diffClassifyI64(int64_t a, int64_t b) {
    uint_least8_t gt = a > b, nz = a != b, out = gt + nz;
    #if defined(__GNUC__) && defined(__x86_64__)
        #ifdef __cpp_lib_is_constant_evaluated
        if (!__builtin_is_constant_evaluated())
        #endif
        if (! __builtin_constant_p(a-b))
            __asm__("cmp{q   %3, %2|     %2, %3}\n\t"
                    "setg{   %0|    %0}\n\t"
                    "setnz{  %1|   %1}\n\t"
                    "add{b   %1, %0|     %0, %1}"
                    : "=r"(out), "=&r"(nz) : "r"(a), "r"(b) : "cc");
    #endif
    return out;
}

Below are tests I ran to ensure the correctness of both the assembly and the C fallback code:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <stdint.h>

typedef uint_least8_t(*test_f)(uint64_t,uint64_t);
__attribute__((noinline)) uint_least8_t callAsmFn(test_f f, int64_t a,
int64_t b, ...) { return f(a, b); }
#define TESTF(f,...) printf("exp "#f"("#__VA_ARGS__") => %d\n",f(__VA_ARGS__));\
printf("asm "#f"("#__VA_ARGS__") => %d\n",callAsmFn((test_f)f,__VA_ARGS__,0))

#include <stdio.h>
int main(void) {
TESTF(numClassifyI64, INT64_MIN);
TESTF(numClassifyI64, -1);
TESTF(numClassifyI64, 0);
TESTF(numClassifyI64, 1);
TESTF(numClassifyI64, INT64_MAX);
printf("\n");
TESTF(diffClassifyU64, 0, 0);
TESTF(diffClassifyU64, 0, 1);
TESTF(diffClassifyU64, 1, 0);
TESTF(diffClassifyU64, 0, UINT64_MAX);
TESTF(diffClassifyU64, UINT64_MAX, 0);
TESTF(diffClassifyU64, 1, UINT64_MAX);
TESTF(diffClassifyU64, UINT64_MAX, 1);
printf("\n");
TESTF(diffClassifyI64, 0, 0);
TESTF(diffClassifyI64, 0, 1);
TESTF(diffClassifyI64, 1, 0);
TESTF(diffClassifyI64, 0, -1);
TESTF(diffClassifyI64, -1, 0);
TESTF(diffClassifyI64, 0, INT64_MAX);
TESTF(diffClassifyI64, 0, INT64_MIN);
TESTF(diffClassifyI64, 1, INT64_MAX);
TESTF(diffClassifyI64, 1, INT64_MIN);
TESTF(diffClassifyI64, -1, INT64_MAX);
TESTF(diffClassifyI64, -1, INT64_MIN);
return 0;
}
Franz answered 7/3, 2024 at 23:43 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.