Trailing/leading zero count for a byte

Asked 18/12, 2008 at 3:33 Answered 1/3, 2010 at 12:22

I'm using Java and I'm coding a chess engine.

I'm trying to find the index of the first 1 bit and the index of the last 1 bit in a byte.

I'm currently using Long.numberOfTrailingZeros() (or something like that) in Java, and would like to emulate that functionality, except with bytes.

Would it be something like:

byte b = 0b011000101;
int firstOneBit = bitCount ((b & -b) - 1);

If so, how would I implement bitCount relatively efficiently. I don't mind good explainations, please don't just give me code.

Axon answered 18/12, 2008 at 3:33 Comment(1)

Anybody who likes these sorts of bit-manipulation questions will enjoy Hank Warren's superb book Hacker's Delight where he shows there are more tricks to bit functions than just lookup tables. Lookup tables have their place---but sometimes you can do way cool stuff with just a few instructions. – Shoon 18/12, 2008 at 4:9

use a lookup tabel with 256 entries. to create it:

unsigned int bitcount ( unsigned int i ) {
unsigned int r = 0;
while ( i ) { r+=i&1; i>>=1; } /* bit shift is >>> in java afair */
return r; 
}

this of course does not need to be fast as you do it at most 256 times to init your tabel.

Sculpturesque answered 18/12, 2008 at 3:41 Comment(3)

Interesting, using a lookup table to precalculate a lookup table! However, it still doesn't answer my question, because I'm not hardcoding 256 entries of a lookup table by hand. – Axon 18/12, 2008 at 3:46

It's because I'm stupid and didn't understand his answer at first. – Axon 2/4, 2009 at 23:21

This is the right answer here. The table will fit in cache. Fancy algorithms are not necessary for a byte-size value, and even on machines that have find first set / count leading zeros instructions, they will generally require slight modification because they only work on 16 bit or larger values, making it unclear if they're even worth the nonportability. – Amoeboid 3/1, 2012 at 4:57

/* Count Leading Zeroes */

static uint8_t clzlut[256] = {
  8,7,6,6,5,5,5,5,
  4,4,4,4,4,4,4,4,
  3,3,3,3,3,3,3,3,
  3,3,3,3,3,3,3,3,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  2,2,2,2,2,2,2,2,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  1,1,1,1,1,1,1,1,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0,
  0,0,0,0,0,0,0,0
};

uint32_t clz(uint32_t val)
{
  uint32_t accum = 0;

  accum += clzlut[val >> 24];
  accum += (accum == 8 ) ? clzlut[(val >> 16) & 0xFF] : 0;
  accum += (accum == 16) ? clzlut[(val >>  8) & 0xFF] : 0;
  accum += (accum == 24) ? clzlut[ val        & 0xFF] : 0;

  return accum;     
}

Explanation:

This works by storing the number of leading zeroes for each permutation of a byte as a lookup table. You use the byte value to look up the count of leading zeroes for that value. Since the example does this for an unsigned int, you shift and mask the four individual bytes, and accumulate the lookups accordingly. The ternary statement is used to stop the accumulation as soon as we find a bit which is set. That the accumulated value is 8, 16 or 24 implies that no set bit is found so far.

Also, some architectures have hardware support for this (as an instruction). The assembly mnemonic is often called 'CLZ' or 'BSR'. They are abbreviations for "Count leading Zeroes" and "Bit Scan Reverse" respectively.

Apothecary answered 1/3, 2010 at 12:22 Comment(1)

Where did this table come from, how do you dynamically generate it? – Rosser 20/4, 2022 at 7:49

The correct answer is that most all processors have some special instructions to do this sort of thing (leading zeros, trailing zeros, number of ones, etc). x86 has bsf/bsr, powerpc has clz, and so on. Hopefully Integer.numberOfTrailingZeros is smart enough to use these, but that's probably the only way that has a chance of using this sort of platform-specific function in Java (if it even uses them).

The Aggregate Magic Algorithms is another place with some approaches to this sort of problem, ranging from the obvious (lookup tables), to some rather clever SWAR approaches. But I suspect they all lose to Integer(x).numberOfTrailingZeros() if the java runtime is smart about the latter; it ought to be possible to optimize out the boxing and use a platform-specific technique for numberOfTrailingZeros, and if it does both that'll win.

Just for completeness, the other classic archive of brilliant bit-whacking is the old MIT HAKMEM collection (there's also a semi-modernized C version if your PDP-6/10 assembler skills have gotten rusty).

Encyclical answered 18/12, 2008 at 4:30 Comment(0)

If you assume that Long.numberOfTrailingZeros is fast (i.e. JIT compiled/optimized to use a single ASM instructions when available), then why can't you simply do something like this:

max(8,Long.numberOfTrailingZeros(val))

where val is your byte value converted to a Long. This is also assuming that max() is available and again optimizes to use asm select or max instructions.

Theoretically, on a machine that supports it, these operations could be JIT compiled to two assembler instructions.

Barbarian answered 17/2, 2009 at 21:18 Comment(0)

Explanation:

Recommended topics

Hot tags