C++ log2 implementation for double using bit-hacking?
Asked Answered
C

1

3

I am aware I can use log2 from cmath in C++, but I wanted to implement an efficient way for learning purposes.

I have seen a bunch of questions that are relevant, but none of them gives a fast implementation that takes a double precision floating point as input and outputs the approximation of log2 for the input as a double.

I want to do this using bit operations, and I ported code from this answer:

#include <cmath>
#include <vector>

const double ln2 = log(2);

vector<float> log2_float() {
    int lim = 1 << 24;
    vector<float> table(lim);
    for (int i = 0; i < lim; i++) {
        table[i] = float(log(i) / ln2) - 150;
    }
    return table;
}
const vector<float> LOG2_FLOAT = log2_float();

float fast_log2(float f) {
    uint32_t bits = *reinterpret_cast<uint32_t*>(&f);
    int e = (bits >> 23) & 0xff;
    int m = bits & 0x7fffff;
    return (e == 0 ? LOG2_FLOAT[m << 1] : e + LOG2_FLOAT[m | 0x00800000]);
}

I won't show you the full code, the rest is irrelevant. I have benchmarked my implementation against log2 and found my implementation an order of magnitude faster, but my code only works for floats:

fast_log2(3.1415927f) = 1.651497 : 2.222061 nanoseconds
log2(3.1415927f) = 1.651496 : 33.1502 nanoseconds

I didn't actually benchmark using pi, I used random values stored in a vector, but the number of iterations are the same and they use values from the same vector.

I want to know, how can I do this for double and get 10 correct decimal digits? I know I can't use a lookup table, that would require much more memory than I have. And ditching the lookup table and still use the same method would require a call to log which would again slow down the code.

I calculated that float has 24 fractional bits, so the maximum correct number of decimal digits is 7.224719895935548, for each correct decimal digit, 1 / log10(2) = 3.321928094887362 bits are required, to get 8 correct decimal digits, 26.575424759098897 bits are required, so I would need a table of 227 values, that would take 230 bytes, or 1GiB, and I only have 16GiB RAM...

What algorithm I can use to achieve this?


Turns out my original calculation is wrong, I only needed 1GiB RAM. Of course I didn't actually create that big a table, I just calculated it again. I am not a mathematician and I didn't spend much effort calculating it.

Anyway I was doing other things and a few minutes ago I gave it a try and created a lookup table for doubles. To get 7 decimal places correct I used the same number of elements, and float vector takes 32MiB and double vector takes 64MiB.

I just changed the numbers and for whatever reason it compiled successfully but I failed to get a result, the program crashed when running the function:

vector<double> log2_double() {
    int lim = 1 << 24;
    vector<double> table(lim);
    for (uint64_t i = 0; i < lim; i++) {
        table[i] = log(i << 29) / ln2 - 1075;
    }
    return table;
}
const vector<double> LOG2_DOUBLE = log2_double();

double fast_log2_double(double d) {
    uint64_t bits = *reinterpret_cast<uint64_t*>(&d);
    uint64_t e = (bits >> 52) & 0xfff;
    uint64_t m = bits & 0xfffffffffffff;
    return (e == 0 ? LOG2_DOUBLE[m >> 28] : e + LOG2_DOUBLE[m | 0x20000000000000]);
}

What is wrong with the above code?

And my goal is to get 10 correct decimal digits, using the same method I would need a lookup table of 234 doubles, which would require 128GiB RAM, which is outrageous.

I want a fast approximation method which guarantees at least 9 accurate decimal digits, sometimes 10 or more.


I compiled the program in debug mode and I got something like IndexingError when the program ran, and I was very surprised since I read C++ is more likely to give some random number.

So I looked at the linked answer again, and Wikipedia article on DPFP and calculated the values again, I fixed the code:

vector<double> log2_double() {
    int lim = 1 << 24;
    vector<double> table(lim);
    for (uint64_t i = 0; i < lim; i++) {
        table[i] = log(i << 29) / ln2 - 1075;
    }
    return table;
}
const vector<double> LOG2_DOUBLE = log2_double();

double fast_log2_double(double d) {
    uint64_t bits = *reinterpret_cast<uint64_t*>(&d);
    uint64_t e = (bits >> 52) & 0x7ff;
    uint64_t m = bits & 0xfffffffffffff;
    return (e == 0 ? LOG2_DOUBLE[m >> 28] : e + LOG2_DOUBLE[(m | 0x10000000000000) >> 29]);
}

This time it works, but I got only 6 correct decimal places, as expected. I want at least 9 correct decimal places and ideally 10. I actually want this to do fractional exponentiation, using ax = ex ln a. Because the way floats are encoded, I want to directly extract the log2 a part from the binary, and use ln a = log2 a * ln 2.

Now I am thinking about using Remez approximation of ex over the interval [1, 2], since the mantissa is 1 something in the binary encoded value, I can use it in the Remez approximation and bit shift and do some basic math to the exponent, and then just join the two binary strings and cast to float/double.

I am working on it.

Calzada answered 19/9, 2023 at 9:15 Comment(7)
Be careful, AFAIK, there is no standard memory layout for floating point values. they follow arithmetic rules (sign, mantissa, exponent) as describe in en.wikipedia.org/wiki/IEEE_754 but even though compiler I tested always used the memory representation found in wikipedia, it's not mandatory. The safest way to manipulate the different floating point part is to retrieve them as integer through standard functions, but this will probably kill your optimisation goalIndian
Note that with a very large table, you might impair an operation that is performing many calls to log2, as you can easily push important stuff out of L1 and L2 cache.Aldarcie
too many magic numbers. Given 24 as the fixed point indicator, better approach is to define other constants initiated from the fixed point indicator. You can mark them as constexpr to get a macro free readable outcome. As of C++20, we prefer to use std::bit_cast over reinterpret_cast or other tricks. Your LUT can be a constexpr instance of std::array, making all sort of optimizations possible. Finally, <cmath> has a rich set of functions to extract mantissa and exponent and build a number out of m+e.Succentor
This could help: math.stackexchange.com/questions/1382070/… - Basically, if you can calculate any logarithm than log2 is a multiplication away (but you can probably change the formula to directly calculate log2). This link gives hints to make an approximate solution for log very accurate by using only basic functions. An approximate solution can be found with small look-up-tables.Eichelberger
While I realise you're going for speed, you might want to specify how much you're willing to sacrifice to get that speed (e.g. accuracy/precision) - since there is virtually always some form of trade-off. Also bear in mind that the memory layout (what bits mean what) in floating points can and more importantly does vary between implementations (not to mention the fact that not all implementations use IEEE).Hards
If you know the value for x=3, log2(3)=1.58496250072 (look-up-table) and 2/ln(2)=2.88539008178 (constant), you can calculate log2(pi)=(pi-3)/(pi+3)*2.88539008178+1.58496250072=1.65148433982, which increased the number of correct digits by 3-4. With the most expensive step the division.Eichelberger
side note: you can use the ln2 constant in std::numbersVomiturition
D
1

You may apply a similar method to the one you showed for float to construct an efficient log2 approximation for double precision floating-point integers. Obtaining 10 accurate decimal digits of accuracy without a lookup table, on the other hand, might be difficult and may necessitate a more complicated approximation approach.

A polynomial approximation is a frequent strategy. Over a given range, a polynomial can be used to approximate the log2 function. You can, for example, utilise the Taylor series expansion:

log2(x) = (x - 1) - (1/2)(x - 1)^2 + (1/3)(x - 1)^3 - (1/4)(x - 1)^4 + ...

To achieve a polynomial approximation, truncate the series to a given number of terms. The better the approximation, the more terms you provide.Here's an example of how you can implement this in C++ for double precision:

#include <iostream>
#include <cmath>

double fast_log2(double x) {
    if (x <= 0.0) {
        // Handle special case for x <= 0
        return -std::numeric_limits<double>::infinity();
    }

    // Ensure x is in the range (0.5, 1]
    if (x < 0.5) {
        x = 1.0 / x;
        return -fast_log2(x);
    }

    // Calculate log2 using a polynomial approximation
    double y = (x - 1.0) / (x + 1.0);
    double result = 0.0;
    double term = y;
    for (int i = 1; i <= 100; i += 2) {
        result += term / i;
        term *= y * y;
    }

    return 2.0 * result;
}

int main() {
    double x = 3.1415927;
    double result = fast_log2(x);
    std::cout << "log2(" << x << ") = " << result << std::endl;
    return 0;
}

For double precision inputs, this method should offer a sufficiently accurate log2 approximation without requiring a lookup table. The precision of the approximation is determined by the number of terms in the series (in this example, up to 100).

Depending on your unique needs, you may vary the number of phrases to strike a compromise between accuracy and performance. More terms will increase precision but will slow down the process.

Daye answered 19/9, 2023 at 9:27 Comment(1)
Series like this are notoriously unstable (only work well for a small range of input values) or converge too slow to be really fast (or accurate enough).Cherry

© 2022 - 2024 — McMap. All rights reserved.