You know your number is positive (which excludes zero) so you can indeed just use n & (n-1) == 0
without checking n != 0
. That's your most efficient option, potentially more efficient than C++20 std::has_single_bit
std::has_single_bit
needs to rule out the no-bits-set case. For that, it can be slightly more efficient on modern x86 to do popcount(n) == 1
if the compiler can assume support for a hardware popcnt
instruction, which is why std::has_single_bit
is often defined that way in C++ standard libraries.
But since you know your number is non-zero, the bithack is most efficient. Especially if compiling for a target where the compiler can't assume a hardware popcount (like x86 without -march=x86-64-v2
or newer), or AArch64 before ARMv9.x where scalar popcount requires copying to vector regs and back. RISC-V only has hardware popcount in an uncommon extension, not baseline.
On x86, it can be as cheap as
lea eax, [rdi-1]
test eax, edi
# boolean condition in ZF
jz or setz or cmovz or whatever
And similar on AArch64 with sub
and tst
. And pretty much any other modern RISC can subtract 1 while putting the result into a separate register, then AND those together cheaply.
And if you're compiling for -march=x86-64-v3
or later (BMI2 + AVX2 + FMA = Haswell feature set), the compiler can use BMI1 blsr eax, edi
to clear ("reset") the lowest set bit and set FLAGS accordingly, with ZF set according to the output being zero. CF is set if the input was zero so some branch conditions can check that n!=0
. But unfortunately conditions like jbe
are CF=1 or ZF=1
, ja
is CF=0 and ZF=0
. There isn't a single FLAGS condition that checks for ZF=1 and CF=0 which would let std::has_single_bit
compile to just blsr
plus a single branch, cmov, or setcc. ja
is taken if the input had multiple bits set, not-taken for power-of-2 or zero.
Unlike test
, blsr
can't macro-fuse with a later jcc
so it doesn't save any uops vs. lea
/test
if you're branching on it. It is better on Intel if the compiler is using it branchlessly, like for setnz
or cmovnz
. blsr
is a single uop on Intel, and on AMD Zen 4 and later. 2 uops on Zen 3 and earlier (https://uops.info/)
Non-popcount way to exclude zero
For use-cases where you can't assume a non-zero input and don't have cheap hardware popcount, there's an alternate bithack that's 3 operations instead of two: (n - 1) < (n ^ (n - 1))
The right hand side is what x86 BMI1 blsmsk
computes, but if we have blsmsk
we have popcnt
.
n-1
is a common subexpression so we only need to compute it once. For example RISC-V; AArch64 could be similar with cmp/bltu
addi x1, x0, -1 # n-1
xor x2, x1, x0 # n ^ (n-1)
bltu x1, x2, power_of_two # branch on a comparison
For zero, n-1
is UINT_MAX, and n ^ anything
is a no-op, so both sides are equal.
For a power of two, n-1
sets all the bits below where the set bit was, and XOR sets that bit again. So it's larger than n
, and also larger than n-1
.
For a non-power-of-two, n ^ (n-1)
is still just a mask up to and including the lowest set bit, with the high bits cancelled (the ones that n-1
didn't flip). So it's smaller than n-1
.
https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2 also suggests v && !(v & (v - 1));
but I don't think that's better since logical &&
has to check both sides for non-zero.
std::popcount(n) == 1
/std::has_single_bit(n)
is equivalent to((n & (n-1)) == 0) && (n != 0)
– Brailleblsr
which doesn & (n-1)
(clearing the lowest set bit) in a single instruction, and setting FLAGS in a useful way for a branch. The FLAGS conditions can even distinguish no bits set (0
) from one bit set: CF=1 if the input was zero, ZF=1 if the output is zero (i.e. ZF holds yourbool test
). You know your integer is positive so zero isn't a possible input, but many use-cases forstd::has_single_bit
can't assume that. – Catechu