I am aware I can use log2 from cmath in C++, but I wanted to implement an efficient way for learning purposes.
I have seen a bunch of questions that are relevant, but none of them gives a fast implementation that takes a double precision floating point as input and outputs the approximation of log2 for the input as a double.
I want to do this using bit operations, and I ported code from this answer:
#include <cmath>
#include <vector>
const double ln2 = log(2);
vector<float> log2_float() {
int lim = 1 << 24;
vector<float> table(lim);
for (int i = 0; i < lim; i++) {
table[i] = float(log(i) / ln2) - 150;
}
return table;
}
const vector<float> LOG2_FLOAT = log2_float();
float fast_log2(float f) {
uint32_t bits = *reinterpret_cast<uint32_t*>(&f);
int e = (bits >> 23) & 0xff;
int m = bits & 0x7fffff;
return (e == 0 ? LOG2_FLOAT[m << 1] : e + LOG2_FLOAT[m | 0x00800000]);
}
I won't show you the full code, the rest is irrelevant. I have benchmarked my implementation against log2 and found my implementation an order of magnitude faster, but my code only works for floats:
fast_log2(3.1415927f) = 1.651497 : 2.222061 nanoseconds
log2(3.1415927f) = 1.651496 : 33.1502 nanoseconds
I didn't actually benchmark using pi, I used random values stored in a vector, but the number of iterations are the same and they use values from the same vector.
I want to know, how can I do this for double and get 10 correct decimal digits? I know I can't use a lookup table, that would require much more memory than I have. And ditching the lookup table and still use the same method would require a call to log which would again slow down the code.
I calculated that float has 24 fractional bits, so the maximum correct number of decimal digits is 7.224719895935548, for each correct decimal digit, 1 / log10(2) = 3.321928094887362 bits are required, to get 8 correct decimal digits, 26.575424759098897 bits are required, so I would need a table of 227 values, that would take 230 bytes, or 1GiB, and I only have 16GiB RAM...
What algorithm I can use to achieve this?
Turns out my original calculation is wrong, I only needed 1GiB RAM. Of course I didn't actually create that big a table, I just calculated it again. I am not a mathematician and I didn't spend much effort calculating it.
Anyway I was doing other things and a few minutes ago I gave it a try and created a lookup table for doubles. To get 7 decimal places correct I used the same number of elements, and float vector takes 32MiB and double vector takes 64MiB.
I just changed the numbers and for whatever reason it compiled successfully but I failed to get a result, the program crashed when running the function:
vector<double> log2_double() {
int lim = 1 << 24;
vector<double> table(lim);
for (uint64_t i = 0; i < lim; i++) {
table[i] = log(i << 29) / ln2 - 1075;
}
return table;
}
const vector<double> LOG2_DOUBLE = log2_double();
double fast_log2_double(double d) {
uint64_t bits = *reinterpret_cast<uint64_t*>(&d);
uint64_t e = (bits >> 52) & 0xfff;
uint64_t m = bits & 0xfffffffffffff;
return (e == 0 ? LOG2_DOUBLE[m >> 28] : e + LOG2_DOUBLE[m | 0x20000000000000]);
}
What is wrong with the above code?
And my goal is to get 10 correct decimal digits, using the same method I would need a lookup table of 234 doubles, which would require 128GiB RAM, which is outrageous.
I want a fast approximation method which guarantees at least 9 accurate decimal digits, sometimes 10 or more.
I compiled the program in debug mode and I got something like IndexingError
when the program ran, and I was very surprised since I read C++ is more likely to give some random number.
So I looked at the linked answer again, and Wikipedia article on DPFP and calculated the values again, I fixed the code:
vector<double> log2_double() {
int lim = 1 << 24;
vector<double> table(lim);
for (uint64_t i = 0; i < lim; i++) {
table[i] = log(i << 29) / ln2 - 1075;
}
return table;
}
const vector<double> LOG2_DOUBLE = log2_double();
double fast_log2_double(double d) {
uint64_t bits = *reinterpret_cast<uint64_t*>(&d);
uint64_t e = (bits >> 52) & 0x7ff;
uint64_t m = bits & 0xfffffffffffff;
return (e == 0 ? LOG2_DOUBLE[m >> 28] : e + LOG2_DOUBLE[(m | 0x10000000000000) >> 29]);
}
This time it works, but I got only 6 correct decimal places, as expected. I want at least 9 correct decimal places and ideally 10. I actually want this to do fractional exponentiation, using ax = ex ln a. Because the way floats are encoded, I want to directly extract the log2 a part from the binary, and use ln a = log2 a * ln 2.
Now I am thinking about using Remez approximation of ex over the interval [1, 2], since the mantissa is 1 something in the binary encoded value, I can use it in the Remez approximation and bit shift and do some basic math to the exponent, and then just join the two binary strings and cast to float/double.
I am working on it.
log2
, as you can easily push important stuff out of L1 and L2 cache. – Aldarcieconstexpr
to get a macro free readable outcome. As of C++20, we prefer to usestd::bit_cast
overreinterpret_cast
or other tricks. Your LUT can be aconstexpr
instance ofstd::array
, making all sort of optimizations possible. Finally, <cmath> has a rich set of functions to extract mantissa and exponent and build a number out of m+e. – Succentorx=3
,log2(3)=1.58496250072
(look-up-table) and2/ln(2)=2.88539008178
(constant), you can calculatelog2(pi)=(pi-3)/(pi+3)*2.88539008178+1.58496250072=1.65148433982
, which increased the number of correct digits by 3-4. With the most expensive step the division. – Eichelbergerln2
constant instd::numbers
– Vomiturition