C++ log2 implementation for double using bit-hacking?

I am aware I can use log2 from cmath in C++, but I wanted to implement an efficient way for learning purposes.

I have seen a bunch of questions that are relevant, but none of them gives a fast implementation that takes a double precision floating point as input and outputs the approximation of log2 for the input as a double.

I want to do this using bit operations, and I ported code from this answer:

#include <cmath>
#include <vector>

const double ln2 = log(2);

vector<float> log2_float() {
    int lim = 1 << 24;
    vector<float> table(lim);
    for (int i = 0; i < lim; i++) {
        table[i] = float(log(i) / ln2) - 150;
    }
    return table;
}
const vector<float> LOG2_FLOAT = log2_float();

float fast_log2(float f) {
    uint32_t bits = *reinterpret_cast<uint32_t*>(&f);
    int e = (bits >> 23) & 0xff;
    int m = bits & 0x7fffff;
    return (e == 0 ? LOG2_FLOAT[m << 1] : e + LOG2_FLOAT[m | 0x00800000]);
}

I won't show you the full code, the rest is irrelevant. I have benchmarked my implementation against log2 and found my implementation an order of magnitude faster, but my code only works for floats:

fast_log2(3.1415927f) = 1.651497 : 2.222061 nanoseconds
log2(3.1415927f) = 1.651496 : 33.1502 nanoseconds

I didn't actually benchmark using pi, I used random values stored in a vector, but the number of iterations are the same and they use values from the same vector.

I want to know, how can I do this for double and get 10 correct decimal digits? I know I can't use a lookup table, that would require much more memory than I have. And ditching the lookup table and still use the same method would require a call to log which would again slow down the code.

I calculated that float has 24 fractional bits, so the maximum correct number of decimal digits is 7.224719895935548, for each correct decimal digit, 1 / log₁₀(2) = 3.321928094887362 bits are required, to get 8 correct decimal digits, 26.575424759098897 bits are required, so I would need a table of 2²⁷ values, that would take 2³⁰ bytes, or 1GiB, and I only have 16GiB RAM...

What algorithm I can use to achieve this?

Turns out my original calculation is wrong, I only needed 1GiB RAM. Of course I didn't actually create that big a table, I just calculated it again. I am not a mathematician and I didn't spend much effort calculating it.

Anyway I was doing other things and a few minutes ago I gave it a try and created a lookup table for doubles. To get 7 decimal places correct I used the same number of elements, and float vector takes 32MiB and double vector takes 64MiB.

I just changed the numbers and for whatever reason it compiled successfully but I failed to get a result, the program crashed when running the function:

vector<double> log2_double() {
    int lim = 1 << 24;
    vector<double> table(lim);
    for (uint64_t i = 0; i < lim; i++) {
        table[i] = log(i << 29) / ln2 - 1075;
    }
    return table;
}
const vector<double> LOG2_DOUBLE = log2_double();

double fast_log2_double(double d) {
    uint64_t bits = *reinterpret_cast<uint64_t*>(&d);
    uint64_t e = (bits >> 52) & 0xfff;
    uint64_t m = bits & 0xfffffffffffff;
    return (e == 0 ? LOG2_DOUBLE[m >> 28] : e + LOG2_DOUBLE[m | 0x20000000000000]);
}

What is wrong with the above code?

And my goal is to get 10 correct decimal digits, using the same method I would need a lookup table of 2³⁴ doubles, which would require 128GiB RAM, which is outrageous.

I want a fast approximation method which guarantees at least 9 accurate decimal digits, sometimes 10 or more.

I compiled the program in debug mode and I got something like IndexingError when the program ran, and I was very surprised since I read C++ is more likely to give some random number.

So I looked at the linked answer again, and Wikipedia article on DPFP and calculated the values again, I fixed the code:

vector<double> log2_double() {
    int lim = 1 << 24;
    vector<double> table(lim);
    for (uint64_t i = 0; i < lim; i++) {
        table[i] = log(i << 29) / ln2 - 1075;
    }
    return table;
}
const vector<double> LOG2_DOUBLE = log2_double();

double fast_log2_double(double d) {
    uint64_t bits = *reinterpret_cast<uint64_t*>(&d);
    uint64_t e = (bits >> 52) & 0x7ff;
    uint64_t m = bits & 0xfffffffffffff;
    return (e == 0 ? LOG2_DOUBLE[m >> 28] : e + LOG2_DOUBLE[(m | 0x10000000000000) >> 29]);
}

This time it works, but I got only 6 correct decimal places, as expected. I want at least 9 correct decimal places and ideally 10. I actually want this to do fractional exponentiation, using a^x = e^{x ln a}. Because the way floats are encoded, I want to directly extract the log₂ a part from the binary, and use ln a = log₂ a * ln 2.

Now I am thinking about using Remez approximation of e^x over the interval [1, 2], since the mantissa is 1 something in the binary encoded value, I can use it in the Remez approximation and bit shift and do some basic math to the exponent, and then just join the two binary strings and cast to float/double.

I am working on it.

#include <iostream> #include <cmath> double fast_log2(double x) { if (x <= 0.0) { // Handle special case for x <= 0 return -std::numeric_limits<double>::infinity(); } // Ensure x is in the range (0.5, 1] if (x < 0.5) { x = 1.0 / x; return -fast_log2(x); } // Calculate log2 using a polynomial approximation double y = (x - 1.0) / (x + 1.0); double result = 0.0; double term = y; for (int i = 1; i <= 100; i += 2) { result += term / i; term *= y * y; } return 2.0 * result; } int main() { double x = 3.1415927; double result = fast_log2(x); std::cout << "log2(" << x << ") = " << result << std::endl; return 0; }

Recommended topics

Hot tags