If you are truly concerned with performance, the best way to clear the msb has recently changed for x86 with the addition of BMI instructions.
In x86 assembly:
clear_msb:
bsrq %rdi, %rax
bzhiq %rax, %rdi, %rax
retq
Now to rewrite in C and let the compiler emit these instructions while gracefully degrading for non-x86 architectures or older x86 processors that don't support BMI instructions.
Compared to the assembly code, the C version is really ugly and verbose. But at least it meets the objective of portability. And if you have the necessary hardware and compiler directives (-mbmi, -mbmi2) to match, you're back to the beautiful assembly code after compilation.
As written, bsr() relies on a GCC/Clang builtin. If targeting other compilers you can replace with equivalent portable C code and/or different compiler-specific builtins.
#include <inttypes.h>
#include <stdio.h>
uint64_t bsr(const uint64_t n)
{
return 63 - (uint64_t)__builtin_clzll(n);
}
uint64_t bzhi(const uint64_t n,
const uint64_t index)
{
const uint64_t leading = (uint64_t)1 << index;
const uint64_t keep_bits = leading - 1;
return n & keep_bits;
}
uint64_t clear_msb(const uint64_t n)
{
return bzhi(n, bsr(n));
}
int main(void)
{
uint64_t i;
for (i = 0; i < (uint64_t)1 << 16; ++i) {
printf("%" PRIu64 "\n", clear_msb(i));
}
return 0;
}
Both assembly and C versions lend themselves naturally to being replaced with 32-bit instructions, as the original question was posed.
(sizeof(int)*CHAR_BIT)
(CHAR_BIT
is inlimits.h
) instead of(sizeof(int)*8)
. – Dillard