The code below works with integer data types of at most 64 bits and
floating point data types of at most ieee-754 double precision accuracy.
For wider data types the same idea can be used, but you'll have to adapt he code.
Since I'm not very familiar with C++, the code is written in C. It shouldn't be too
difficult to convert it to a C++ style code. The code is branchless, which might
be a performance benefit.
#include <stdio.h>
// gcc -O3 -march=haswell cmp.c
// Assume long long int is 64 bits.
// Assume ieee-754 double precision.
int long_long_less_than_double(long long int i, double y) {
long long i_lo = i & 0x00000000FFFFFFFF; // Extract lower 32 bits.
long long i_hi = i & 0xFFFFFFFF00000000; // Extract upper 32 bits.
double x_lo = (double)i_lo; // Exact conversion to double, no rounding errors!
double x_hi = (double)i_hi; //
return ( x_lo < (y - x_hi) ); // If i is close to y then y - x_hi is exact,
// due to Sterbenz' lemma.
// i < y
// i_lo +i_hi < y
// i_lo < (y - i_hi)
// x_lo < (y - x_hi)
}
int long_long_equals_double(long long int i, double y) {
long long i_lo = i & 0x00000000FFFFFFFF;
long long i_hi = i & 0xFFFFFFFF00000000;
double x_lo = (double)i_lo;
double x_hi = (double)i_hi;
return ( x_lo == (y - x_hi) );
}
int main()
{
long long a0 = 999999984306749439;
long long a1 = 999999984306749440; // Hex number: 0x0DE0B6B000000000
long long a2 = 999999984306749441;
float b = 999999984306749440.f; // This number can be represented exactly by a `float`.
printf("%lli less_than %20.1f = %i\n", a0, b, long_long_less_than_double(a0, b)); // Implicit conversion from float to double
printf("%lli less_than %20.1f = %i\n", a1, b, long_long_less_than_double(a1, b));
printf("%lli equals %20.1f = %i\n", a0, b, long_long_equals_double(a0, b));
printf("%lli equals %20.1f = %i\n", a1, b, long_long_equals_double(a1, b));
printf("%lli equals %20.1f = %i\n\n", a2, b, long_long_equals_double(a2, b));
long long c0 = 1311693406324658687;
long long c1 = 1311693406324658688; // Hex number: 0x1234123412341200
long long c2 = 1311693406324658689;
double d = 1311693406324658688.0; // This number can be represented exactly by a `double`.
printf("%lli less_than %20.1f = %i\n", c0, d, long_long_less_than_double(c0, d));
printf("%lli less_than %20.1f = %i\n", c1, d, long_long_less_than_double(c1, d));
printf("%lli equals %20.1f = %i\n", c0, d, long_long_equals_double(c0, d));
printf("%lli equals %20.1f = %i\n", c1, d, long_long_equals_double(c1, d));
printf("%lli equals %20.1f = %i\n", c2, d, long_long_equals_double(c2, d));
return 0;
}
The idea is to split the 64 bits integer i
in 32 upper bits i_hi
and 32 lower bits i_lo
,
which are converted to doubles x_hi
and x_lo
without any rounding errors.
If double y
is close to x_hi
, then the floating point subtraction y - x_hi
is exact,
due to Sterbenz' lemma.
So, instead of x_lo + x_hi < y
, we can test for x_lo < (y - x_hi)
, which is more accurate!
If double y
is not close to x_hi
then y - x_hi
is inacurate, but in that case we
don't need the accuracy because then |y - x_hi|
is much larger than |x_lo|
. In other words:
If i
and y
differ much than we don't have to worry about the value of the lower 32 bits.
Output:
999999984306749439 less_than 999999984306749440.0 = 1
999999984306749440 less_than 999999984306749440.0 = 0
999999984306749439 equals 999999984306749440.0 = 0
999999984306749440 equals 999999984306749440.0 = 1
999999984306749441 equals 999999984306749440.0 = 0
1311693406324658687 less_than 1311693406324658688.0 = 1
1311693406324658688 less_than 1311693406324658688.0 = 0
1311693406324658687 equals 1311693406324658688.0 = 0
1311693406324658688 equals 1311693406324658688.0 = 1
1311693406324658689 equals 1311693406324658688.0 = 0
float
o.O – Centnerfloat
, although the space is sparse in that region. The next one differs from it by the GDP of a small country. – Differentfloat c = a; std::cout << a << " < " << b << " = " << ((a-(long long)c) < (b-c)) << '\n';
It should be a generic fix to "implicit conversion lost precision". "One of the values is out of range for the other datatype" is a different problem that needs a different solution. – Drawn