If you will call those methods a lot, the fastest way would be not bit manipulation but probably a look-up table. Define an array of length 511 for each operation.
Example for minus (subtraction)
static unsigned char maxTable[511];
memset(maxTable, 0, 255); // If smaller, emulates cutoff at zero
maxTable[255]=0; // If equal - return zero
for (int i=0; i<256; i++)
maxTable[255+i] = i; // If greater - return the difference
The array is static and initialized only once. Now your subtraction can be defined as inline method or using pre-compiler:
#define MINUS(A,B) maxTable[A-B+255];
How it works? Well you want to pre-calculate all possible subtractions for unsigned chars. The results vary from -255 to +255, total of 511 different result. We define an array of all possible results but because in C we cannot access it from negative indices we use +255 (in [A-B+255]). You can remove this action by defining a pointer to the center of the array.
const unsigned char *result = maxTable+255;
#define MINUS(A,B) result[A-B];
use it like:
bsub = MINUS(13,15); // i.e 13-15 with zero cutoff as requested
Note that the execution is extremely fast. Only one subtraction and one pointer deference to get the result. No branching. The static arrays are very short so they will be fully loaded into CPU's cache to further speed up the calculation
Same would work for addition but with a bit different table (first 256 elements will be the indices and last 255 elements will be equal to 255 to emulate the cutoff beyond 255.
If you insist on bits operation, the answers that use (a>b) are wrong. This still might be implemented as branching. Use the sign-bit technique
// (num1>num2) ? 1 : 0
#define is_int_biggerNotEqual( num1,num2) ((((__int32)((num2)-(num1)))&0x80000000)>>31)
Now you can use it for calculation of subtraction and addition.
If you want to emulate the functions max(), min() without branching use:
inline __int32 MIN_INT(__int32 x, __int32 y){ __int32 d=x-y; return y+(d&(d>>31)); }
inline __int32 MAX_INT(__int32 x, __int32 y){ __int32 d=x-y; return x-(d&(d>>31)); }
My examples above use 32 bits integers. You can change it to 64, though I believe that 32 bits calculations run a bit faster. Up to you
y ^ ((x ^ y) & -(x < y))
forint
types evaluatesmin(x, y)
without branching. This could form part of an eventual solution, based on what you have so far. – Internisty ^ ((x ^ y) & -(x < y));
ismin(x,y)
(usually) without a branch (wherex < y
might, depending on the machine still be a branch) but modern architecture has conditional move which is probably not much slower. – Bichar
type, then, from C++14 onwards that must be 2's complement. – InternistI just wonder if there are any better ways to do this, i.e. by some hacky bit manipulations?
Unless the code is in a very performance critical part, the hacky optimizations only end up harder to read. I suggest a more specific question like...are there any more efficient ways...
. – Badge_mm_adds_epi8
intrinsic will do a saturating add of 16 bytes in a single instruction. – Burge_mm_subs_epi8
for subtraction. – Kibosh