Fastest implementation of log2(int) and log2(float)

Asked 12/4, 2013 at 9:6 Answered 29/11, 2020 at 20:33

The question is

Are there any other (and/or faster) implementations of a basic 2log?

Applications

The log2(int) and log2(float) operations are very useful in a lot of different contexts. To name a few: compression algorithms, 3d engines and machine learning. In almost all of these contexts they are used in the low-level code that is called billions of times... Especially the log2(int) operation is very useful.

Because I find myself using log2 all the time, I don't want to give a specific application I'm working on here. What is the same is the fact that this is a real performance drainer (as shown by performance tests of various applications). For me it's key to get this as fast as possible.

The complete source code to test all implementations is added at the bottom, so you can see for yourself.

And of course... run your tests at least 3 times and make sure the counters are big enough to hit multiple seconds. Also I do the 'add' operation to ensure the whole loop isn't magically removed by the JIT'ter. So let's get started with the real work.

Trivial implementation

The trivial implementation of a 2log in C# is:

(int)(Math.Log(x) / Math.Log(2))

This implementation is trivial, but also very slow. It requires 2 Log operations, that are in itself quite slow already. Of course, we can optimize this by making 1.0/Math.Log(2) a constant.

Note that we need to modify this constant a bit to get the right results (as a result of floating point errors) or add a small number to get the correct results. I chose the latter, but it doesn't really matter - the end result is slow in all cases.

Table lookup

A faster solution for this is to use a lookup table. While you can use a lookup table of any power of 2, I usually use a table size of 256 or 64K entries.

First we create the lookup table:

lookup = new int[256];
for (int i = 1; i < 256; ++i)
{
    lookup[i] = (int)(Math.Log(i) / Math.Log(2));
}

Next, we implement the 2log as follows:

private static int LogLookup(int i)
{
    if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
    else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
    else if (i >= 0x100) { return lookup[i >> 8] + 8; }
    else { return lookup[i]; }
}

As you can see, table lookups are a much, much faster implementation - but as a con it cannot be used to calculate log2(float).

Branch removal

As we all know, processors aren't very good at branching, so I figured that table lookups can be improved by removing the branches. Instead of the bunches of if's I introduced a second table with the values and shift bits around to find the entry in the table:

nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };

private static int LogDoubleLookup(int i)
{
    int n = (i | (i >> 4));
    n = (n | (n >> 2));
    n = (n | (n >> 1));
    n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
    int br = nobranch[n];
    return lookup[i >> br] + br;
}

If you run this test, you will find that it is actually slower than the if-then-else solution.

And then there was the Intel 80386

Intel understood years ago that this is an important operation, so they implemented Bit-Scan-Forward (BSF) into their processors. Other processors have similar instructions. This is by far the fastest way to do a 2log that I know of - but unfortunately I know of now way to use these nice functions from C#... I don't like the idea of having an implementation that doesn't run anymore when a new tablet or phone hits the market - and I don't know of any cross-platform solution that enables me to use this function directly.

Other implementations

As l4V pointed out (thanks!) there are a couple of other implementations, specifically:

Trivial loop. I omitted this because it's trivial this isn't really fast. Implemented in TestTrivial.
64-bit IEEE / int union's that can be used. Implemented in TestFloat
DeBruijn lookup tables. Implemented in TestDeBruijn
Binary search. Implemented in TestBinary

Apart that I like the name, the DeBruijn lookup tables are just as fast as the normal lookup tables, making it one of the fastest algorithms here... all the other algorithms I've tried are much slower.

Complete test code

public class Log2Test
{
    public static void TestNaive()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += (int)(Math.Log(i) / Math.Log(2.0));
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - naive implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    public static int LogTrivialLoop(int v)
    {
        int r = 0;
        while ((v >>= 1) > 0) // unroll for more speed...
        {
            r++;
        }
        return r;
    }

    public static void TestTrivialLoop()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogTrivialLoop(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - loop implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    public static int LogFloat(int v)
    {
        Helper h = new Helper() { U1 = v, U2 = 0x43300000 };
        h.D -= 4503599627370496.0;
        return (h.U2 >> 20) - 0x3FF;
    }

    public static void TestFloat()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogFloat(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - IEEE float implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct Helper
    {
        [FieldOffset(0)]
        public int U1;
        [FieldOffset(4)]
        public int U2;
        [FieldOffset(0)]
        public double D;
    }

    public static void TestConstant()
    {
        double c = 1.0 / Math.Log(2.0);
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += (int)(0.00000000001 + Math.Log(i) * c);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - naive 2 implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogLookup(int i)
    {
        if (i >= 0x1000000) { return lookup[i >> 24] + 24; }
        else if (i >= 0x10000) { return lookup[i >> 16] + 16; }
        else if (i >= 0x100) { return lookup[i >> 8] + 8; }
        else { return lookup[i]; }
    }

    public static void TestLookup()
    {
        lookup = new int[256];
        for (int i = 1; i < 256; ++i)
        {
            lookup[i] = (int)(Math.Log(i) / Math.Log(2));
        }
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogLookup(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogDoubleLookup(int i)
    {
        int n = (i | (i >> 4));
        n = (n | (n >> 2));
        n = (n | (n >> 1));
        n = ((n & 0x1000000) >> 21) | ((n & 0x10000) >> 14) | ((n & 0x100) >> 7) | (n & 1);
        int br = nobranch[n];
        return lookup[i >> br] + br;
    }

    public static void TestDoubleLookup()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogDoubleLookup(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - double table lookup implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int LogBinary(int v)
    {
        /* This is the worst implementation ever... - apparently C# is a slow-branching language

        int[] b = { 0x2, 0xC, 0xF0, 0xFF00, 0x7FFF0000 };
        int[] S = { 1, 2, 4, 8, 16 };

        int r = 0; // result of log2(v) will go here
        for (int i = 4; i >= 0; i--) // unroll for speed...
        {
            if ((v & b[i]) != 0)
            {
                v >>= S[i];
                r |= S[i];
            }
        }
        return r;

         */

        int r = (((v > 0xFFFF)) ? 0x10 : 0); 
        v >>= r;
        int shift = ((v > 0xFF) ? 0x8 : 0); 
        v >>= shift; 
        r |= shift;
        shift = ((v > 0xF) ? 0x4 : 0); 
        v >>= shift;
        r |= shift;
        shift = ((v > 0x3) ? 0x2 : 0); 
        v >>= shift;
        r |= shift;
        r |= (v >> 1);
        return r;
    }

    public static void TestBinary()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogBinary(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - binary search implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static readonly int[] MultiplyDeBruijnBitPosition = new int[32]
    {
        0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
        8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
    };

    private static int LogDeBruijn(int v)
    {
        v |= v >> 1; // first round down to one less than a power of 2 
        v |= v >> 2;
        v |= v >> 4;
        v |= v >> 8;
        v |= v >> 16;

        return MultiplyDeBruijnBitPosition[(uint)(v * 0x07C4ACDDU) >> 27];
    }

    public static void TestDeBruijn()
    {
        // Lookup table was already constructed earlier
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int n = 0;
        for (int i = 1; i < 100000000; ++i)
        {
            n += LogDeBruijn(i);
        }
        sw.Stop();
        Console.WriteLine("Result: {0} - de Bruijn implementation took {1:0.000}s", n, sw.Elapsed.TotalSeconds);
    }

    private static int[] lookup;
    private static readonly int[] nobranch = new int[16] { 0, 0, 8, 8, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24 };

    static void Main(string[] args)
    {
        TestConstant();
        TestNaive();
        TestDeBruijn();
        TestBinary();
        TestFloat();
        TestTrivialLoop();
        TestLookup();
        TestDoubleLookup();
        Console.ReadLine();
    }
}

Landry answered 12/4, 2013 at 9:6 Comment(8)

graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious – Tapetum 12/4, 2013 at 9:18

@I4V, that's a great answer. – Boodle 12/4, 2013 at 9:22

@l4V ah yes, these are the ones from the book Hackers Delight. I'll test the remaining ones and add them to the code. Let's see how fast they are in C#... I've been surprised more than one time... – Landry 12/4, 2013 at 9:25

Stefan de Bruijin using de Bruijin sequence, seems correct :) – Sibelius 12/4, 2013 at 10:14

@ydroneaud yeah it's fast, memory-efficient, looks cool and it has a nice ring to the name- definitely my personal favorite so far :-) Still, I feel this should be possible faster... – Landry 12/4, 2013 at 10:24

maybe you could write your own log algorithm Log(e,X) to get the fastet solution – Lithopone 12/4, 2013 at 15:3

I know this is a pretty old post, but I was wondering how one could use these with float input? I am trying to find the log2 of a number between 0-1; could I leverage these methods for a float? – Armenta 14/12, 2015 at 18:52

@user984444 In my experience it's hard to beat the standard (C++) implementation for that. Only thing that comes to mind that might work is to use a polynominal estimation algorithm and use AVX instructions to calculate the log for vectors at a time... and even for that I'm not sure. – Landry 15/12, 2015 at 7:54

Took the binary solution already mentioned and removed the branching. Did some testing and it turned out to be 1.3 times faster than DeBruijn.

public static int Log2(int v)
{
    int r = 0xFFFF - v >> 31 & 0x10;
    v >>= r;
    int shift = 0xFF - v >> 31 & 0x8;
    v >>= shift; 
    r |= shift;
    shift = 0xF - v >> 31 & 0x4;
    v >>= shift;
    r |= shift;
    shift = 0x3 - v >> 31 & 0x2;
    v >>= shift;
    r |= shift;
    r |= (v >> 1);
    return r;
}

Wright answered 4/6, 2015 at 12:31 Comment(3)

Very Interesting; I created the exact same thing in C++ some time ago (probably with AVX2 intrinsics), and it turned out to be slower. I'm going to look into this a bit more when I have some time to spare, thanks. – Landry 4/6, 2015 at 13:5

In my C# 4.0 benchmarks, this method performed 20% slower than the int LogDeBruijn(int v) in atlaste's answer (which I also found to be the best). – Subtlety 23/1, 2017 at 3:37

@SpecialSauce, did you place the algorithm inside the for-loop itself or a method that is called in the for-loop? I always put the algorithm I'm benchmarking inside the for-loop because method calls are slow in C# and I'm not testing the method call. Also did you make sure you tested it in Release mode? – Wright 13/3, 2017 at 18:57

There are some integer algorithms here.

In C#:

public static uint FloorLog2(uint x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    return (uint)(NumBitsSet(x) - 1);
}

public static uint CeilingLog2(uint x)
{
    int y = (int)(x & (x - 1));

    y |= -y;
    y >>= (WORDBITS - 1);
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);

    return (uint)(NumBitsSet(x) - 1 - y);
}

public static int NumBitsSet(uint x)
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);

    return (int)(x & 0x0000003f);
}

private const int WORDBITS = 32;

You should look at the original code on the site I linked for the context, particularly what happens with Log2(0).

Carbylamine answered 12/4, 2013 at 9:26 Comment(5)

Thanks. I've tried specifically the FloorLog2 because it resembles what I'm doing. However, I'm afraid I found the algorithm to be twice as slow as the DeBruijn and the single lookup solutions... – Landry 12/4, 2013 at 9:54

Interesting - well, it was worth a shot! :) – Carbylamine 12/4, 2013 at 10:20

Sure, thanks. I was also surprised to find that in C# table lookups are often faster than the alternatives... Still, I'm a bit sad that there is an instruction in almost all CPU's since 1985 that does all the work without calculating a thing... it just seems be just out of your reach the moment you go to a 'sophisticated language'... – Landry 12/4, 2013 at 10:30

@StefandeBruijn how about using P/Invoke? – Jacquelinjacqueline 29/9, 2013 at 10:2

@SargeBorsch in my tests that was slower than these alternatives. – Landry 1/10, 2013 at 18:34

clean reliable and fast!
(requires .net core 3 or later)

int val = BitOperations.Log2(x);

Amygdala answered 29/11, 2020 at 20:33 Comment(0)

For more algorithms look here http://www.asmcommunity.net/forums/topic/?id=15010

Also did some testing in C++ and my implementation of BSR is slower than lookup table

i am using BDS2006 there is probably slow down by state push/popping by asm directive
your lookup is fine but i am using 11 bits table instead of 8
it divides 32 bit into 3 branches instead of 4
and it is still small enough to handle without init function

code:

//---------------------------------------------------------------------------
DWORD log2_slow(const DWORD &x)
    {
    DWORD m,i;
    if (!x) return 0;
    if (x>=0x80000000) return 31;
    for (m=1,i=0;m<x;m<<=1,i++);
     if (m!=x) i--;
    return i;
    }
//---------------------------------------------------------------------------
DWORD log2_asm(const DWORD &x)
    {
    DWORD xx=x;
    asm {
        mov eax,xx
        bsr eax,eax;
        mov xx,eax;
        }
    return xx;
    }
//---------------------------------------------------------------------------
BYTE _log2[2048]=
    {
     0, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
     7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
     9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,
    };
DWORD log2(const DWORD &x)
    {
         if (x>=0x00400000) return _log2[x>>22]+22;
    else if (x>=0x00000800) return _log2[x>>11]+11;
    else                    return _log2[x];
    }
//---------------------------------------------------------------------------

test code:

DWORD x,j,i,n=256;
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2     (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_asm (j<<i); tend(); mm_log->Lines->Add(tstr(1));
tbeg(); for (i=0;i<32;i++) for (j=0;j<n;j++) x=log2_slow(j<<i); tend(); mm_log->Lines->Add(tstr(1));

my results on AMD A8-5500 3.2 GHz:

[   0.040 ms] log2     (x) - 11bit lookup table
[   0.060 ms] log2_asm (x) - BSR
[   0.415 ms] log2_slow(x) - shift loop

Note:

log2(0) -> 0 because of use of DWORDS, in real it should be -inf
all other values are correct for all functions

Whitlow answered 29/9, 2013 at 9:54 Comment(4)

BSR is notoriously slow on AMD (fast on Intel) – Palmar 29/9, 2013 at 10:2

may be but that asm directive slow down is there ... have stepped on it many times (asm implementation is slower than C++ unless the asm speed gain overcomes that state push/popping thing) for now the only thing i have speeded up by asm is mul and div/mod from 32bit uint operations – Whitlow 29/9, 2013 at 10:11

That too, could you try it with declspec(naked) or with plain asm that you link in? – Palmar 29/9, 2013 at 10:40

I'd also make sure to test the DeBruijn method; it needs so few data it probably fits in a cache line -- that might give you better results. – Landry 1/10, 2013 at 18:36

inline int fast_log2(register double x)
{ 
    return (reinterpret_cast<uint64_t&>(x) >> 52) - 1023;
};

Keratoid answered 9/2, 2018 at 7:27 Comment(0)

Another log2(int) function: (no longer the fastest)

    [StructLayout(LayoutKind.Explicit)]
    private struct ConverterStruct
    {
        [FieldOffset(0)] public int asInt;
        [FieldOffset(0)] public float asFloat;
    }

    public static int Log2(uint val)
    {
        ConverterStruct a;  a.asInt = 0; a.asFloat = val;
        return ((a.asInt >> 23 )+ 1) & 0x1F;
    }

Notes: The inspiration for using the exponent in a float is from SPWorley 3/22/2009. Use with caution on production code since this would fail on architectures that are not little-endianness.

If you want something "endianness" safe then check out spender 5/3/2012. It also has Zero support.

The fastest is the new built-in BitOperations.Log2(x)

Here are some benchmarks: (code here: https://github.com/SunsetQuest/Fast-Integer-Log2)

Function               Time1  Full-32-Bit  Zero?   FUNCTION                  
BitOperationsLog2        2        Yes      Yes     BitOperations.Log2(x);
LeadingZeroCount         2        Yes      Yes     31 - BitOperations.LeadingZeroCount(x);
Log2_SunsetQuest5        16       Yes      No      ((BitConverter.DoubleToInt64Bits(val)>>52)+1) & 0xFF;
Log2_WiegleyJ            17       Yes      Yes     ...
MostSigBit_spender       17       Yes      Yes     ...
Log2_SPWorley            17       Yes      Yes     ...
Log2_SunsetQuest4        18       Yes      No      ...
FloorLg2_Matthew_Watson  18       Yes      Yes     ...
Log2_SunsetQuest3        19       Yes      No      ...
Log2_SunsetQuest1        20       Yes      Yes     ...
Log2_HarrySvensson       20       Yes      Yes     ...
Log2_DanielSig           21       No       Yes     ...
HighestBitUnrolled_Kaz   25       Yes      Yes     ...
FloorLog2_SN17           36       Yes      Yes     ...
Log2_Papayaved           44       Yes      Yes     ...
GetMsb_user3177100       45       Yes      Yes     ...
Log2_Flynn1179           57       Yes      Yes     ...
Msb_Protagonist          63       Yes      Yes     ...
SomeOtherMethod          76       Yes      Yes     ...
Log2_SunsetQuest0        98       Yes      Yes     ...
Log2_SunsetQuest2        131      Yes      Yes     ...
SomeOtherMethod          202      Yes      Yes     ...
SomeOtherMethod          545      Yes      Yes     ...

Zero_Support    = Supports Neg Return on Zero
Full-32-Bit     = Supports full 32-bit (some just support 31 bits)
SomeOtherMethod = name of function/person left out on purpose
Benchmark notes: AMD Ryzen, Release, no-debugger attached, .net 6.0

Amygdala answered 22/10, 2019 at 5:8 Comment(0)

There have been quite a few answers providing fast approximate approaches to log2(int) but few for log2(float), so here's two (Java implementation given) that use both a lookup table and mantissa/bit hacking:

Fast accurate log2(float):

/**
 * Calculate the logarithm to base 2, handling special cases.
 */
public static float log2(float x) {

    final int bits = Float.floatToRawIntBits(x);
    final int e = (bits >> 23) & 0xff;
    final int m = (bits & 0x7fffff);

    if (e == 255) {
        if (m != 0) {
            return Float.NaN;
        }
        return ((bits >> 31) != 0) ? Float.NaN : Float.POSITIVE_INFINITY;
    }

    if ((bits >> 31) != 0) {
        return (e == 0 && m == 0) ? Float.NEGATIVE_INFINITY : Float.NaN;
    }

    return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
}

Note:

If the argument is NaN or less than zero, then the result is NaN.
If the argument is positive infinity, then the result is positive infinity.
If the argument is positive zero or negative zero, then the result is negative infinity.

Fast accurate log2(float) (slightly faster, no checking):

/**
 * Calculate the logarithm using base 2. Requires the argument be finite and
 * positive.
 */
public static float fastLog2(float x) {
    final int bits = Float.floatToRawIntBits(x);
    final int e = (bits >> 23) & 0xff;
    final int m = (bits & 0x7fffff);
    return (e == 0 ? data[m >>> qm1] : e + data[((m | 0x00800000) >>> q)]);
}

This second method forgoes the checking present in the other method and therefore has the following special cases:

If the argument is NaN, then the result is incorrect.
If the argument is negative, then the result is incorrect.
If the argument is positive infinity, then the result is incorrect.
If the argument is positive zero or negative zero, then the result is negative infinity.

Both methods upon rely on a lookup table data (and variables q and qm1). These are populated with the following method. n defines the accuracy-space tradeoff.

static int q, qm1;
static float[] data;

/**
 * Compute lookup table for a given base table size.
 * 
 * @param n The number of bits to keep from the mantissa. Table storage =
 *          2^(n+1) * 4 bytes, e.g. 64Kb for n=13. Must be in the range
 *          0<=n<=23
 */
public static void populateLUT(int n) {

    final int size = 1 << (n + 1);

    q = 23 - n;
    qm1 = q - 1;
    data = new float[size];

    for (int i = 0; i < size; i++) {
        data[i] = (float) (Math.log(i << q) / Math.log(2)) - 150;
    }
}

populateLUT(12);
log2(6666); // = 12.702606

Consensual answered 12/7, 2020 at 20:12 Comment(0)

    static byte FloorLog2(UInt16 value)
    {
        for (byte i = 0; i < 15; ++i)
        {
            if ((value >>= 1) < 1)
            {
                return i;
            }
        }
        return 15;
    }

Bertero answered 12/6, 2019 at 7:13 Comment(1)

It is good practice to add at least a small comment about how code samples address the issue in the original question. – Jovial 12/6, 2019 at 7:46

(I haven't done any measurements so this may not match up, but I thought user user9337139's idea was neat and wanted to try the same in C# - his is C++).

Here's a C# int Magnitude(byte) function based on converting the byte value to float and extracting the exponent from the IEEE float representation.

    using System.Runtime.InteropServices;

    [StructLayout(LayoutKind.Explicit)]
    struct UnionWorker
    {
        [FieldOffset(0)]
        public int i;
        [FieldOffset(0)]
        public float f;
    }

    static int Magnitude(byte b)
    {
        UnionWorker u;
        u.i = 0; // just to please the compiler
        u.f = b;
        return Math.Max((u.i >> 23) & 0xFF, 126) - 126;
    }

Returns zero for zero, 8 for 0xFF, other values as you would expect.

Zero is a special case, so I needed the Math.Max clamp for that. I suspect user9337139's solution might have a similar problem.

Note, this has not been tested for endianness issues - caveat emptor.

Solipsism answered 16/10, 2019 at 12:13 Comment(1)

I just noticed your answer and thought it was a copy of mine but then I looked at the date. I posted the same answer just a few days after you (see GitHub . What are the odds of such a close answer and on the same date? Anyway, just thought I would share this worldly alignment. – Amygdala 28/12, 2021 at 0:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Fast accurate log2(float):

Fast accurate log2(float) (slightly faster, no checking):

Recommended topics

Hot tags