How to do an integer log2() in C++?
Asked Answered
R

17

56

In the C++ standard libraries I found only a floating point log method. Now I use log to find the level of an index in a binary tree ( floor(2log(index)) ).

Code (C++):

int targetlevel = int(log(index)/log(2));

I am afraid that for some of the edge elements (the elements with value 2^n) log will return n-1.999999999999 instead of n.0. Is this fear correct? How can I modify my statement so that it always will return a correct answer?

Rriocard answered 15/6, 2009 at 5:30 Comment(5)
I don't get the question. Why would it return n - 1,9(9)?Trisoctahedron
Because not all integers can be stored exactly in as a floating point number. If 7 would be not fitting, it would be stored as 7.000001 or 6.999999 for example.Rriocard
Yeap, I know that. But where has this 1,9(9) come from? Perhaps you could reformat the question using <sup></sup> for upper indices and <sub></sub> for lower indices?Trisoctahedron
Any integer can be stored exactly in a floating-point number. However, the log() function isn't necessarily precise, and even if it is log(2) is irrational for either natural logs or base 10, so there's no reason to expect an exact result. Given that exact results can't be guaranteed, it makes sense to worry about the exact border conditions.Microampere
You have to have pretty large integers, probably 2^exponentsize before they can't be exactly represented. If you have loss of precision in this case, it's because log(2) can't be exactly represented. Will you only ever call this method for 2^n? If so, you can round to nearest integer (or just use the accepted answer)Angelangela
P
58

You can use this method instead:

int targetlevel = 0;
while (index >>= 1) ++targetlevel;

Note: this will modify index. If you need it unchanged, create another temporary int.

The corner case is when index is 0. You probably should check it separately and throw an exception or return an error if index == 0.

Plenty answered 15/6, 2009 at 5:47 Comment(8)
Does the while loop evaluate 0-integers to false?Rriocard
If index = 0, targetlevel is going to be 0. In your code it will probably cause exception. What value would you like to get for index = 0?Plenty
I mean to say, the loop has to stop when index >>= 1 evaluates to 0. I couldn't find somewhere quickly that the while loop will really stop when the expression evaluates to an integer zero. It is of course logic it will, because the bits are the same then as boolean false.Rriocard
...actually, in your code it's not not exception - it will evaluate to minus infinity and then converted to int as maximum negative int value.Plenty
Ah, yes, while loop will stop at 0. There were no booleans in C, and C++ follows the same rules.Plenty
Make sure to specify index as unsigned int, otherwise you have a very dangerous potentially infinite loop bug on your hands.Abdication
@Abdication makes a really good point. A negative number, right shifted, never reaches zero. Not only that, but taking the logarithm of a negative integer is undefined in the first place (as is taking the logarithm of zero). Personally, I like to return –1 when asked for the integer logarithm of zero.Opprobrium
Starting from C++20 you can use std:.bit_width(index) - 1 to apply the same technique but in short and compact.Machree
H
87

If you are on a recent-ish x86 or x86-64 platform (and you probably are), use the bsr instruction which will return the position of the highest set bit in an unsigned integer. It turns out that this is exactly the same as log2(). Here is a short C or C++ function that invokes bsr using inline ASM:

#include <stdint.h>
static inline uint32_t log2(const uint32_t x) {
  uint32_t y;
  asm ( "\tbsr %1, %0\n"
      : "=r"(y)
      : "r" (x)
  );
  return y;
}
Hyperthermia answered 15/6, 2009 at 6:24 Comment(10)
And on ARM you'd want clz, which returns 31 minus the value you want. GCC has __builtin_clz, which presumably uses bsr on x86.Cushion
That is a really great answerAbdication
To avoid the subtraction, use __builtin_ctz instead. int log2 (int x){return __builtin_ctz (x);} It also works on x86.Lopeared
@user2573802 This is wrong. __builtin_ctz(9) = 0 which is not log2(9).Takeshi
Oh, I remember now. It's finding the lowest-order bit that's set, so it only works for numbers which are exact powers of two. The first bit (bit 0) is set for 9 (1001), so it returns 0.Lopeared
Just to mention it, if you use MSVS the equivalent intrinsic is _BitScanReverseTopo
static inline uint32_t log2(const uint32_t x){return (31 - __builtin_clz (x));} works both on intel and ARM (but have wrong result for 0 on ARM: log2(0) = 4294967295). So the full analogue of intel's bsr is: static inline uint32_t log_2(const uint32_t x){if(x == 0) return 0;return (31 - __builtin_clz (x));}Gaven
@Gaven not sure what your point about log2(0) was since mathematically speaking log(0) is undefined for all bases. It returning INT_MAX isn't less "correct" than returning 0.Cattima
"recent-ish" means "386 or newer" :)Cralg
I'm so confused, why has nobody mentioned __builtin_ffs? That seems to be the best intrinsic here?Eloquence
P
58

You can use this method instead:

int targetlevel = 0;
while (index >>= 1) ++targetlevel;

Note: this will modify index. If you need it unchanged, create another temporary int.

The corner case is when index is 0. You probably should check it separately and throw an exception or return an error if index == 0.

Plenty answered 15/6, 2009 at 5:47 Comment(8)
Does the while loop evaluate 0-integers to false?Rriocard
If index = 0, targetlevel is going to be 0. In your code it will probably cause exception. What value would you like to get for index = 0?Plenty
I mean to say, the loop has to stop when index >>= 1 evaluates to 0. I couldn't find somewhere quickly that the while loop will really stop when the expression evaluates to an integer zero. It is of course logic it will, because the bits are the same then as boolean false.Rriocard
...actually, in your code it's not not exception - it will evaluate to minus infinity and then converted to int as maximum negative int value.Plenty
Ah, yes, while loop will stop at 0. There were no booleans in C, and C++ follows the same rules.Plenty
Make sure to specify index as unsigned int, otherwise you have a very dangerous potentially infinite loop bug on your hands.Abdication
@Abdication makes a really good point. A negative number, right shifted, never reaches zero. Not only that, but taking the logarithm of a negative integer is undefined in the first place (as is taking the logarithm of zero). Personally, I like to return –1 when asked for the integer logarithm of zero.Opprobrium
Starting from C++20 you can use std:.bit_width(index) - 1 to apply the same technique but in short and compact.Machree
M
34

Starting from C++20 you can use

std::bit_width(index) - 1

Very short, compact, fast and readable.

It follows the same idea as the answer provided by Igor Krivokon.

Machree answered 21/9, 2020 at 7:22 Comment(0)
H
24

If you just want a fast integer log2 operation, the following function mylog2() will do it without having to worry about floating-point accuracy:

#include <limits.h>

static unsigned int mylog2 (unsigned int val) {
    if (val == 0) return UINT_MAX;
    if (val == 1) return 0;
    unsigned int ret = 0;
    while (val > 1) {
        val >>= 1;
        ret++;
    }
    return ret;
}

#include <stdio.h>

int main (void) {
    for (unsigned int i = 0; i < 20; i++)
        printf ("%u -> %u\n", i, mylog2(i));
    putchar ('\n');
    for (unsigned int i = 0; i < 10; i++)
        printf ("%u -> %u\n", i+UINT_MAX-9, mylog2(i+UINT_MAX-9));
    return 0;
}

The code above also has a small test harness so you can check the behaviour:

0 -> 4294967295
1 -> 0
2 -> 1
3 -> 1
4 -> 2
5 -> 2
6 -> 2
7 -> 2
8 -> 3
9 -> 3
10 -> 3
11 -> 3
12 -> 3
13 -> 3
14 -> 3
15 -> 3
16 -> 4
17 -> 4
18 -> 4
19 -> 4

4294967286 -> 31
4294967287 -> 31
4294967288 -> 31
4294967289 -> 31
4294967290 -> 31
4294967291 -> 31
4294967292 -> 31
4294967293 -> 31
4294967294 -> 31
4294967295 -> 31

It will return UINT_MAX for an input value of 0 as an indication of an undefined result, so that's something you should check for (no valid unsigned integer will have a logarithm that high).

By the way, there are some insanely fast hacks to do exactly this (find the highest bit set in a 2's complement number) available from here. I wouldn't suggest using them unless speed is of the essence (I prefer readability myself) but you should be made aware that they exist.

Holeproof answered 15/6, 2009 at 5:59 Comment(4)
paxdiablo — I like that you're returning –1 for an input value of 0. Note, however, that you aren't actually returning -1, but actually instead ~0 (e.g., 0xFFFFFFFF if you have 32-bit integers), since you've declared the function to return an unsigned int rather than int. In this sense, ~0 is the closest to infinity that you can get in an integer.Opprobrium
@ToddLehman: You are actually returning -1. It then has an integral promotion applied, which for negative numbers sets the value to 2 ** 32 - n, and since n == -1 here, the value is equal to the max unsigned. On some systems, ~0 will not give you want you want. unsigned is defined in terms of values, not in terms of bit representation.Parham
@Holeproof — By the way, you mention that the “correct” value for log₂(0) is infinity, but wouldn't it actually be negative infinity? That is, $\lim{x \to 0} log x = -\infty$.Opprobrium
@Todd, absolutely correct, the limit approaches negative infinity. However, since logarithms aren't actually defined for zero (despite the limit), I've rewritten that bit to remove it.Holeproof
O
20

Base-2 Integer Logarithm

Here is what I do for 64-bit unsigned integers. This calculates the floor of the base-2 logarithm, which is equivalent to the index of the most significant bit. This method is smokingly fast for large numbers because it uses an unrolled loop that executes always in log₂64 = 6 steps.

Essentially, what it does is subtracts away progressively smaller squares in the sequence { 0 ≤ k ≤ 5: 2^(2^k) } = { 2³², 2¹⁶, 2⁸, 2⁴, 2², 2¹ } = { 4294967296, 65536, 256, 16, 4, 2, 1 } and sums the exponents k of the subtracted values.

int uint64_log2(uint64_t n)
{
  #define S(k) if (n >= (UINT64_C(1) << k)) { i += k; n >>= k; }

  int i = -(n == 0); S(32); S(16); S(8); S(4); S(2); S(1); return i;

  #undef S
}

Note that this returns –1 if given the invalid input of 0 (which is what the initial -(n == 0) is checking for). If you never expect to invoke it with n == 0, you could substitute int i = 0; for the initializer and add assert(n != 0); at entry to the function.

Base-10 Integer Logarithm

Base-10 integer logarithms can be calculated using similarly — with the largest square to test being 10¹⁶ because log₁₀2⁶⁴ ≅ 19.2659...

int uint64_log10(uint64_t n)
{
  #define S(k, m) if (n >= UINT64_C(m)) { i += k; n /= UINT64_C(m); }

  int i = -(n == 0);
  S(16,10000000000000000); S(8,100000000); S(4,10000); S(2,100); S(1,10);
  return i;

  #undef S
}

Note that a good compiler will optimize the integer division operations here into multiplication instructions, since the divisions are always by a constant. (This matters because integer division instructions are still very slow even on the fastest modern CPUs, as compared with multiplication instructions.)

Opprobrium answered 15/7, 2014 at 1:37 Comment(3)
Very pretty. With a decent compiler and the right instruction set, the conditional actions might all be implemented as predicated instructions, so there are no branch mispredicts; it is all pure computation in the registers at the (superscalar) rate the typical modern processor can achieve.Bacchius
@IraBaxter — Thanks... And surprisingly, in the log2 case, this method of comparing against a list of constants is about 60% faster (on my system) than shifting and checking for zero. (I suppose because of modern instruction pipeline caches.) That is, doing if (n >> k) {...} to shift and compare with zero is actually 60% slower than doing if (n >= (UINT64_C(1) << k)) {...} to compare against a 64-bit constant.Opprobrium
Fun Fact : std::ilogb(double) is faster in microbenchmark for the log2 case (GCC 8 / Debian / -O3 / 1000 random numbers 1000 repetitions). This function: 7.6ns, std::ilogb: 5ns, bitshift: 20ns, identity function: 0.7nsMeshed
E
14

This has been proposed in the comments above. Using gcc builtins:

static inline int log2i(int x) {
    assert(x > 0);

    return sizeof(int) * 8 - __builtin_clz(x) - 1;
}

static void test_log2i(void) {
    assert_se(log2i(1) == 0);
    assert_se(log2i(2) == 1);
    assert_se(log2i(3) == 1);
    assert_se(log2i(4) == 2);
    assert_se(log2i(32) == 5);
    assert_se(log2i(33) == 5);
    assert_se(log2i(63) == 5);
    assert_se(log2i(INT_MAX) == sizeof(int)*8-2);
}
Estes answered 15/3, 2014 at 1:26 Comment(2)
Can't find the docs for assert_se -- I assume it can just be assert.Mastin
Use unsigned x and this matches floor(log2(x)) for all 32-bit values (except zero). I ran an exhaustive test with gcc 4.8.2 on x86 with sizeof(int)==4.Mastin
O
7

If you're using C++11 you can make this a constexpr function:

constexpr std::uint32_t log2(std::uint32_t n) noexcept
{
    return (n > 1) ? 1 + log2(n >> 1) : 0;
}
Ohaus answered 2/3, 2016 at 21:23 Comment(0)
C
5

With GCC:

int log2(int x) {
    return sizeof(int)*8 - 1 - __builtin_clz(x);
}

assuming your x is > 0

Cannibalize answered 1/12, 2019 at 11:32 Comment(2)
__builtin_clz is not a standard function in C++.Andi
@Andi log2 is the type of function a low level library would implement, and low level libraries make use of compiler intrinsics all the time.Appointed
G
3

I've never had any problem with floating-point accuracy on the formula you're using (and a quick check of numbers from 1 to 231 - 1 found no errors), but if you're worried, you can use this function instead, which returns the same results and is about 66% faster in my tests:

int HighestBit(int i){
    if(i == 0)
        return -1;

    int bit = 31;
    if((i & 0xFFFFFF00) == 0){
        i <<= 24;
        bit = 7;
    }else if((i & 0xFFFF0000) == 0){
        i <<= 16;
        bit = 15;
    }else if((i & 0xFF000000) == 0){
        i <<= 8;
        bit = 23;
    }

    if((i & 0xF0000000) == 0){
        i <<= 4;
        bit -= 4;
    }

    while((i & 0x80000000) == 0){
        i <<= 1;
        bit--;
    }

    return bit; 
}
Gilbertogilbertson answered 15/6, 2009 at 6:0 Comment(2)
Indeed, the danger in using the log(number)/log(base) method isn't so much with a base of 2 as it is with other numbers. For example log(1000) / log(10) gives 2.9999999999999996 (the floor of which is 2 instead of 3) with IEEE double-precision semantics.Opprobrium
But also note that since IEEE double-precision values only have 53 bits of mantissa (52 plus an understood leading 1-bit), the log(number)/log(base) method falls apart completely for numbers above 2⁵³, which is a very large subset of the 64-bit integers. So while you're safe using log(number)/log(base) with 32-bit integers, you're asking for trouble with 64-bit integers.Opprobrium
N
3

There are similar answers above. This answer

  1. Works with 64 bit numbers
  2. Lets you choose the type of rounding and
  3. Includes test/sample code

Functions:

    static int floorLog2(int64_t x)
    { 
      assert(x > 0);
      return 63 - __builtin_clzl(x);
    }

    static int ceilLog2(int64_t x)
    {
      if (x == 1)
        // On my system __builtin_clzl(0) returns 63.  64 would make more sense   
        // and would be more consistent.  According to stackoverflow this result  
        // can get even stranger and you should just avoid __builtin_clzl(0).     
        return 0;
      else
        return floorLog2(x-1) + 1;
    }

Test Code:

for (int i = 1; i < 35; i++)
  std::cout<<"floorLog2("<<i<<") = "<<floorLog2(i)
           <<", ceilLog2("<<i<<") = "<<ceilLog2(i)<<std::endl;
Nasho answered 22/4, 2017 at 18:58 Comment(0)
R
3

Rewriting Todd Lehman's answer to be more generic:

#include <climits>

template<typename N>
constexpr N ilog2(N n) {
    N i = 0;
    for (N k = sizeof(N) * CHAR_BIT; 0 < (k /= 2);) {
        if (n >= static_cast<N>(1) << k) { i += k; n >>= k; }
    }
    return i;
}

Clang with -O3 unrolls the loop:

0000000100000f50    pushq   %rbp
0000000100000f51    movq    %rsp, %rbp
0000000100000f54    xorl    %eax, %eax
0000000100000f56    cmpl    $0xffff, %edi
0000000100000f5c    setg    %al
0000000100000f5f    shll    $0x4, %eax
0000000100000f62    movl    %eax, %ecx
0000000100000f64    sarl    %cl, %edi
0000000100000f66    xorl    %edx, %edx
0000000100000f68    cmpl    $0xff, %edi
0000000100000f6e    setg    %dl
0000000100000f71    leal    (,%rdx,8), %ecx
0000000100000f78    sarl    %cl, %edi
0000000100000f7a    leal    (%rax,%rdx,8), %eax
0000000100000f7d    xorl    %edx, %edx
0000000100000f7f    cmpl    $0xf, %edi
0000000100000f82    setg    %dl
0000000100000f85    leal    (,%rdx,4), %ecx
0000000100000f8c    sarl    %cl, %edi
0000000100000f8e    leal    (%rax,%rdx,4), %eax
0000000100000f91    xorl    %edx, %edx
0000000100000f93    cmpl    $0x3, %edi
0000000100000f96    setg    %dl
0000000100000f99    leal    (%rdx,%rdx), %ecx
0000000100000f9c    sarl    %cl, %edi
0000000100000f9e    leal    (%rax,%rdx,2), %ecx
0000000100000fa1    xorl    %eax, %eax
0000000100000fa3    cmpl    $0x1, %edi
0000000100000fa6    setg    %al
0000000100000fa9    orl %ecx, %eax
0000000100000fab    popq    %rbp

When n is constant, result is computed in compilation time.

Rendition answered 16/6, 2018 at 19:7 Comment(1)
Nice work! And cool to see the assembly output.Opprobrium
G
2
int targetIndex = floor(log(i + 0.5)/log(2.0));
Gum answered 15/6, 2009 at 5:51 Comment(1)
This is well-defined for the hardest case (2^N-1), up to at least N=32, but runs into problems around N=(52-log(52)) or so, when the double-precision result of log starts returning identical results for adjacent values.Pebbly
M
2

This isn't standard or necessarily portable, but it will in general work. I don't know how efficient it is.

Convert the integer index into a floating-point number of sufficient precision. The representation will be exact, assuming the precision is sufficient.

Look up the representation of IEEE floating-point numbers, extract the exponent, and make the necessary adjustment to find the base 2 log.

Microampere answered 15/6, 2009 at 14:6 Comment(1)
"Sufficient precision" here equals IEEE double-precision (64-bit a.k.a. double in C) for handling 32-bit integers and IEEE extended-double-precision (80-bit a.k.a. long double in C) for handling 64-bit integers.Opprobrium
M
2

This function determines how many bits are required to represent the numeric interval: [0..maxvalue].

unsigned binary_depth( unsigned maxvalue )
   {
   int depth=0;
   while ( maxvalue ) maxvalue>>=1, depth++;
   return depth;
   }

By subtracting 1 from the result, you get floor(log2(x)), which is an exact representation of log2(x) when x is a power of 2.

xyy-1
00-1
110
221
321
432
532
632
732
843

Mastin answered 20/2, 2013 at 21:26 Comment(1)
This can easily be generalized to support any 'radix' (numeric base) -- just use /=radix (divide by radix) in place of the >>=1.Mastin
A
0

How deep do you project your tree to be? You could set a range of say... +/- 0.00000001 to the number to force it to an integer value.

I'm actually not certain you'll hit a number like 1.99999999 because your log2 should not lose any accuracy when calculating 2^n values (Since floating point rounds to the nearest power of 2).

Abbatial answered 15/6, 2009 at 5:49 Comment(0)
A
0

This function I wrote here

// The 'i' is for int, there is a log2 for double in stdclib
inline unsigned int log2i( unsigned int x )
{
  unsigned int log2Val = 0 ;
  // Count push off bits to right until 0
  // 101 => 10 => 1 => 0
  // which means hibit was 3rd bit, its value is 2^3
  while( x>>=1 ) log2Val++;  // div by 2 until find log2.  log_2(63)=5.97, so
  // take that as 5, (this is a traditional integer function!)
  // eg x=63 (111111), log2Val=5 (last one isn't counted by the while loop)
  return log2Val ;
}
Abdication answered 14/2, 2013 at 17:43 Comment(0)
M
0

Given the way floating point numbers work (crudely, mantissa * 2^exponent), then any number up to 2^127 that is a power of 2 will be exactly represented without error.

This does give a trivial but rather hacky solution - interpret the bit pattern of the floating point number as an integer, and just look at the exponent. This is David Thornley's solution above.

float f = 1;
for (int i = 0; i < 128; i++)
{
    int x = (*(int*)(&f)>>23) - 127;
    int l = int(log(f) / log(2));

    printf("i = %d, log = %d, f = %f quick = %d\n",
        i, l, f, x);
    f *= 2;
}

It is not true that any integer can be represented as a float - only those with fewer bits than the mantissa can represent. In 32bit floats, that is 23 bits worth.

Mishear answered 11/6, 2019 at 9:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.