How do you efficiently count the trailing zero bits in a number?
Asked Answered
C

4

6

I've written a function trailing_zeroes(int n) that returns the number of the trailing zeroes in the binary representation of a number.

Example: 4 in binary is 100, so the function in this case returns 2.

unsigned trailing_zeroes(int n) {
    unsigned bits;

    bits = 0;
    while (n >= 0 && !(n & 01)) {
        ++bits;
        if (n != 0)
            n >>= 1;
        else
            break;
    }
    return bits;
}

The reason of the if statement is because in case n equals to 0, there will be a loop.

I think it's pretty ugly this code written like this; is there a better way?

I want to avoid the break statement inside the while, because a lot of people told me that using that statement inside while/for sometimes could be "informal". I thought to rewrite the function like this, but I don't think it's the best way to do it:

unsigned bits;
if (n == 0)
    return bits = 1;

bits = 0;
while (!(n & 01)) {
    ++bits;
    n >>= 1;
}
Castellany answered 20/7, 2017 at 17:52 Comment(8)
Try __builtin_ctz.Eckard
Simplest way might be converting to a std::string and inspect that one.Fretwell
What is your specific question? We are not a code review service (as a sidenote: the code has implementation defined behaviour for certain values of n).Blen
@Olaf Excuse me I don't understand. What could happen if n has those certain values ? (i'm really sorry for my bad english)Castellany
Tip: Avoid using C and C++. Although the question may have relevance to both languages, the best answer may be language specific.Whippoorwill
Why does 0 have 1 trailing zero? Shouldn't it have as many trailing zeros as there are bits in the integer type? Or should it have zero trailing bits because there is no 1-bit to be followed by zeros? (And the assignment in return bits = 1; is a bit bizarre; simply return 1; would be more sensible, though the compiler will almost certainly convert the code as if you wrote that anyway.)Cheeky
@JonathanLeffler I was wrong. Now i think that it should return as many trailing zeros as there are bits in the integer type (as you said). But there are some built-in function that in case x is 0 the result is undefined (why ?)Castellany
I don't know why the __builtin_ctz in GCC gives an undefined result for zero, but (guesswork) it is likely because the native implementations on different platforms yields different results — and it is likely that the two results are zero and the number of bits in the integer type. But rather than make the result platform-specific, GCC makes it undefined, so portable code doesn't call the function with an argument that's zero.Cheeky
C
6

Your function is incorrect: it still has an infinite loop for 0. The test should be:

while (n > 0 && !(n & 1))

Note that you cannot handle negative numbers with this approach, hence your function should probably take an unsigned number argument, or you could convert the argument to unsigned.

Your function should special case 0 and use a simpler loop:

unsigned trailing_zeroes(int n) {
    unsigned bits = 0, x = n;

    if (x) {
        while ((x & 1) == 0) {
            ++bits;
            x >>= 1;
        }
    }
    return bits;
}

The above function is very simple and easy to understand. It is quite fast if the result is small. The value returned for 0 is 0 as in your function, which is questionable as 0 really has as many trailing zeroes as value bits in the unsigned type.

There is a more efficient approach with a constant number of steps:

unsigned trailing_zeroes(int n) {
    unsigned bits = 0, x = n;

    if (x) {
        /* assuming `x` has 32 bits: lets count the low order 0 bits in batches */
        /* mask the 16 low order bits, add 16 and shift them out if they are all 0 */
        if (!(x & 0x0000FFFF)) { bits += 16; x >>= 16; }
        /* mask the 8 low order bits, add 8 and shift them out if they are all 0 */
        if (!(x & 0x000000FF)) { bits +=  8; x >>=  8; }
        /* mask the 4 low order bits, add 4 and shift them out if they are all 0 */
        if (!(x & 0x0000000F)) { bits +=  4; x >>=  4; }
        /* mask the 2 low order bits, add 2 and shift them out if they are all 0 */
        if (!(x & 0x00000003)) { bits +=  2; x >>=  2; }
        /* mask the low order bit and add 1 if it is 0 */
        bits += (x & 1) ^ 1;
    }
    return bits;
}

Note that we could handle any larger int size by changing the first step to

while (!(x & 0x0000FFFF)) { bits += 16; x >>= 16; }

Some compilers have a built-in function __builtin_ctz() to count the number of trailing zeroes using very efficient assembly code. It is not a C Standard function but at the cost of reduced portability, you might want to use it if it is available. Check your compiler's documentation.

Here is the abstract from GCC documentation:

Built-in Function: int __builtin_ctz (unsigned int x)

Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.

Colton answered 20/7, 2017 at 21:0 Comment(8)
How about unsigned bits = sizeof n * CHAR_BIT; ... if (x) { bits = 0; ... } to handle the 0 case? (I guess it all depends on how you define trailing) I could see it being 0 or sizeof n * CHAR_BIT depending on the definition.Commonable
@DavidC.Rankin: yes, I chose to stick with the OP's semantics, which are simpler to implement.Colton
Yes, I agree with that. This is one that just struck me -- that I don't know the answer to. Zero would have all trailing zeros, but trailing behind nothing -- so it is one of those things that will just have to be as desired.Commonable
@DavidC.Rankin: as you can see from the gcc doc, 0 must be special cased for the built-in function as well.Colton
About the second approach: how do you know what costants number you have to use ? I mean, it's not very intuitive, that's pretty complex for me :P. Anyway, why do you convert the number to unsigned ? I guess because, for example, if in case n equals to ** -4 ** converting it to unsigned it becames 4 ? So why don't just change the sign in unsigned trailing_zeroes(unsigned n) (anyway thank you, you've been very very helpful :3)Castellany
For negative operands, << has undefined behavior and the result of >> is implementation-defined. They are well defined for unsigned.Commonable
@ClaudioPisa: the number is converted to unsigned to ensure defined behavior for right shifting x. Right shift a signed int with a negative value has implementation defined behavior, whereas the conversion is always defined and the function counts the trailing zeroes of negative correctly (-2 -> 111..110 -> 1).Colton
@ClaudioPisa: for the constant values, it is easy: I shall add comments in the source code.Colton
H
7

As already mentioned, there is a builtin that can do this, and as it may use hardware, it may be very fast. However, the doc for GCC does say that the result is undefined if the input is 0. Since this is an extension it may not be available for your compiler.

Otherwise, whenever anyone says 'but manipulation' or 'bit counting', you need to reach for your copy of "Hacker's Delight". A book so good that I bought both editions. There are about 4 pages (1st ed.) dedicated to this, 'ntz' (number of trailing zeroes). If you already have a 'nlz' (number of leading zeroes) or a 'popcnt' function, then you can get the ntz directly. Otherwise the book gives several implementations, some using popcnt, one with a loop, the others using binary search.

For instance,

int ntz3(unsigned x) {
   int n;

   if (x == 0) return(32);
   n = 1;
   if ((x & 0x0000FFFF) == 0) {n = n +16; x = x >>16;}
   if ((x & 0x000000FF) == 0) {n = n + 8; x = x >> 8;}
   if ((x & 0x0000000F) == 0) {n = n + 4; x = x >> 4;}
   if ((x & 0x00000003) == 0) {n = n + 2; x = x >> 2;}
   return n - (x & 1);
}
Heavyladen answered 20/7, 2017 at 18:22 Comment(1)
Please provide a reference to the standard defining this built-in (or any built-in function).Blen
E
7

C++20 introduced the <bit> header, which provides countr_zero. The linked cppreference page provides the example:

#include <bit>
#include <bitset>
#include <cstdint>
#include <iostream>
 
int main()
{
    for (const std::uint8_t i : { 0, 0b11111111, 0b00011100, 0b00011101 }) {
        std::cout << "countr_zero( " << std::bitset<8>(i) << " ) = "
                  << std::countr_zero(i) << '\n';
    }
}

Output:

countr_zero( 00000000 ) = 8
countr_zero( 11111111 ) = 0
countr_zero( 00011100 ) = 2
countr_zero( 00011101 ) = 0

If you need a value of 0 for a 0 input, you can do something like:

(n ? countr_zero(n) : 0)
Exert answered 3/1, 2023 at 1:54 Comment(0)
C
6

Your function is incorrect: it still has an infinite loop for 0. The test should be:

while (n > 0 && !(n & 1))

Note that you cannot handle negative numbers with this approach, hence your function should probably take an unsigned number argument, or you could convert the argument to unsigned.

Your function should special case 0 and use a simpler loop:

unsigned trailing_zeroes(int n) {
    unsigned bits = 0, x = n;

    if (x) {
        while ((x & 1) == 0) {
            ++bits;
            x >>= 1;
        }
    }
    return bits;
}

The above function is very simple and easy to understand. It is quite fast if the result is small. The value returned for 0 is 0 as in your function, which is questionable as 0 really has as many trailing zeroes as value bits in the unsigned type.

There is a more efficient approach with a constant number of steps:

unsigned trailing_zeroes(int n) {
    unsigned bits = 0, x = n;

    if (x) {
        /* assuming `x` has 32 bits: lets count the low order 0 bits in batches */
        /* mask the 16 low order bits, add 16 and shift them out if they are all 0 */
        if (!(x & 0x0000FFFF)) { bits += 16; x >>= 16; }
        /* mask the 8 low order bits, add 8 and shift them out if they are all 0 */
        if (!(x & 0x000000FF)) { bits +=  8; x >>=  8; }
        /* mask the 4 low order bits, add 4 and shift them out if they are all 0 */
        if (!(x & 0x0000000F)) { bits +=  4; x >>=  4; }
        /* mask the 2 low order bits, add 2 and shift them out if they are all 0 */
        if (!(x & 0x00000003)) { bits +=  2; x >>=  2; }
        /* mask the low order bit and add 1 if it is 0 */
        bits += (x & 1) ^ 1;
    }
    return bits;
}

Note that we could handle any larger int size by changing the first step to

while (!(x & 0x0000FFFF)) { bits += 16; x >>= 16; }

Some compilers have a built-in function __builtin_ctz() to count the number of trailing zeroes using very efficient assembly code. It is not a C Standard function but at the cost of reduced portability, you might want to use it if it is available. Check your compiler's documentation.

Here is the abstract from GCC documentation:

Built-in Function: int __builtin_ctz (unsigned int x)

Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.

Colton answered 20/7, 2017 at 21:0 Comment(8)
How about unsigned bits = sizeof n * CHAR_BIT; ... if (x) { bits = 0; ... } to handle the 0 case? (I guess it all depends on how you define trailing) I could see it being 0 or sizeof n * CHAR_BIT depending on the definition.Commonable
@DavidC.Rankin: yes, I chose to stick with the OP's semantics, which are simpler to implement.Colton
Yes, I agree with that. This is one that just struck me -- that I don't know the answer to. Zero would have all trailing zeros, but trailing behind nothing -- so it is one of those things that will just have to be as desired.Commonable
@DavidC.Rankin: as you can see from the gcc doc, 0 must be special cased for the built-in function as well.Colton
About the second approach: how do you know what costants number you have to use ? I mean, it's not very intuitive, that's pretty complex for me :P. Anyway, why do you convert the number to unsigned ? I guess because, for example, if in case n equals to ** -4 ** converting it to unsigned it becames 4 ? So why don't just change the sign in unsigned trailing_zeroes(unsigned n) (anyway thank you, you've been very very helpful :3)Castellany
For negative operands, << has undefined behavior and the result of >> is implementation-defined. They are well defined for unsigned.Commonable
@ClaudioPisa: the number is converted to unsigned to ensure defined behavior for right shifting x. Right shift a signed int with a negative value has implementation defined behavior, whereas the conversion is always defined and the function counts the trailing zeroes of negative correctly (-2 -> 111..110 -> 1).Colton
@ClaudioPisa: for the constant values, it is easy: I shall add comments in the source code.Colton
T
5

There are a whole variety of methods for ntz reported in "Hacker's Delight" by Henry Warren.

I thought the De Bruijn sequence solution was pretty wild. See https://en.wikipedia.org/wiki/De_Bruijn_sequence#Finding_least-_or_most-significant_set_bit_in_a_word.

Here is a 64-bit implementation, like one would use in a chess engine to handle a "bitboard".

int ntz(uint64_t x) {
    // We return the number of trailing zeros in
    // the binary representation of x.
    //
    // We have that 0 <= x < 2^64.
    //
    // We begin by applying a function sensitive only
    // to the least significant bit (lsb) of x:
    //
    //   x -> x^(x-1)  e.g. 0b11001000 -> 0b00001111
    //
    // Observe that x^(x-1) == 2^(ntz(x)+1) - 1.

    uint64_t y = x^(x-1);

    // Next, we multiply by 0x03f79d71b4cb0a89,
    // and then roll off the first 58 bits.

    constexpr uint64_t debruijn = 0x03f79d71b4cb0a89;

    uint8_t z = (debruijn*y) >> 58;

    // What? Don't look at me like that.
    //
    // With 58 bits rolled off, only 6 bits remain,
    // so we must have one of 0, 1, 2, ..., 63.
    //
    // It turns out this number was judiciously
    // chosen to make it so each of the possible
    // values for y were mapped into distinct slots.
    //
    // So we just use a look-up table of all 64
    // possible answers, which have been precomputed in 
    // advance by the the sort of people who write
    // chess engines in their spare time:

    constexpr std::array<int,64> lookup = {
         0, 47,  1, 56, 48, 27,  2, 60,
        57, 49, 41, 37, 28, 16,  3, 61,
        54, 58, 35, 52, 50, 42, 21, 44,
        38, 32, 29, 23, 17, 11,  4, 62,
        46, 55, 26, 59, 40, 36, 15, 53,
        34, 51, 20, 43, 31, 22, 10, 45,
        25, 39, 14, 33, 19, 30,  9, 24,
        13, 18,  8, 12,  7,  6,  5, 63
    };

    return lookup[z];
}
Thrilling answered 2/5, 2022 at 14:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.