Unexpected result after converting uint64_t to double
Asked Answered
T

1

8

In the following code:

#include <iostream>

...

uint64_t t1 = 1510763846;
uint64_t t2 = 1510763847;
double d1 = (double)t1;
double d2 = (double)t2;
// d1 == t2 => evaluates to true somehow?
// t1 == d2 => evaluates to true somehow?
// d1 == d2 => evaluates to true somehow?
// t1 == t2 => evaluates to false, of course.
std::cout << std::fixed << 
        "uint64_t: " << t1 << ", " << t2 << ", " <<
        "double: " << d1 << ", " << d2 << ", " << (d2+1) << std::endl;

I get this output:

uint64_t: 1510763846, 1510763847, double: 1510763904.000000, 1510763904.000000, 1510763905.000000

And I don't understand why. This answer: biggest integer that can be stored in a double says that an integral number up to 2^53 (9007199254740992) can be stored in a double without losing precision.

I actually get errors when I start doing calculations with the doubles, so it's not only a printing issue. (e.g. 1510763846 and 1510763847 both give 1510763904)

It's also very weird that the double can just be added to and then come out correct (d2+1 == 1510763905.000000)

Rationale: I'm converting these numbers to doubles because I need to work with them in Lua, which only supports floating point numbers. I'm sure I'm compiling the Lua lib with double as the lua_Number type, not float.

std::cout << sizeof(t1) << ", " << sizeof(d2) << std::endl;

Outputs

8, 8

I'm using VS 2012 with target MachineX86, toolkit v110_xp. Floating point model "Precise (/fp:precise)"

Addendum

With the help of people who replied and this article Why are doubles added incorrectly in a specific Visual Studio 2008 project?, I've been able to pinpoint the problem. A library is using a function like _set_controlfp, _control87, _controlfp or __control87_2 to change the precision of my executable to "single". That is why a uint64_t conversion to a double behaves as if it's a float.

When doing a file search for the above function names and "MCW_PC", which is used for Precision Control, I found the following libraries that might have set it:

  • Android NDK
  • boost::math
  • boost::numeric
  • DirectX (We're using June 2010)
  • FMod (non-EX)
  • Pyro particle engine

Now I'd like to rephrase my question:

How do I make sure converting from a uint64_t to a double goes correctly every time, without:

  1. having to call _fpreset() each and every time a possible conversion occurs (think about the function parameters)
  2. having to worry about a library's thread changing the floating point precision in between my _fpreset() and the conversion?

Naive code would be something like this:

double toDouble(uint64_t i)
{
    double d;
    do {
        _fpreset();
        d = i;
        _fpreset();
    } while (d != i);
    return d;
}

double toDouble(int64_t i)
{
    double d;
    do {
        _fpreset();
        d = i;
        _fpreset();
    } while (d != i);
    return d;
}

This solution assumes the odds of a thread messing with the Floating Point Precision twice are astronomically small. Problem is, the values I'm working with, are timers that represent real-world value. So I shouldn't be taking any chances. Is there a silver bullet for this problem?

Trash answered 15/11, 2017 at 16:43 Comment(17)
you have d2 twice in yr cout. Why does t1 == t2 when you say 'HIT!' thats for sure wrongRifle
This works correctly on ideone after fixing a typo in your code (demo).Sneakers
there is something more suspicious going on here. This is not the full codeRifle
Works properly with VS 2017 and gcc 7.2 after correcting typo.Mines
What platform are you compiling for? What compiler are you using? What compiler flags?Supranational
What platform and compiler are you using. Cos I get both being as you would expect on VS2013 (Release build) on an x86 running in 64-bit mode.Sacrilege
I get only reproduce if I use float instead of double.Supranational
aki means sizeof(double) > sizeof(float)Rifle
I meant to check sizeof float == sizeof double. Double has to be at least the size of float, but not necessarily larger.Khichabia
What does "HIT! (naturally)" mean? Does it mean that the numbers are equal? Does it mean that the numbers are different? Anything else?Hitt
@Rifle Saying “HIT” there means that the assert is actually triggered; i.e., he didn’t accidentally change NDEBUG or something.\Supranational
Thank you for confirming that computers are awesome and that the error must be with my compiler settings. (Which means it's fixable, yay!) Thanks for the cool online tool link, @dasblinkenlight !Trash
I'm using VS 2012 with target MachineX86, toolkit v110_xp. Floating point model "Precise (/fp:precise)". Any more info?Trash
You've added #include directives for <iostream> (which you need) and <time.h> (which you don't use). You also need #include headers for <cassert> and <cstdint>. I suggest you update your question to show a complete self-contained program that we can copy-and-paste and run unmodified on our own systems. minimal reproducible exampleCalvert
@KeithThompson I’m guessing that <iostream> itself includes <cassert> and <cstdint> on the OP’s machine, so it’s hard to notice the missing headers.Supranational
@DanielH: That is unfortunately possible.Calvert
Keith’s right. I should make it self-contained and test it that way, too. Tomorrow I’ll try and see if_fpreset() restores my sanity.Trash
K
0

From ieee754 floating point conversion it looks like your implementation of double is actually float, which is of course allowed by the standard, that mandates that sizeof double >= sizeof float.

The most accurate representation of 1510763846 is 1.510763904E9.

Khichabia answered 15/11, 2017 at 16:54 Comment(4)
I thought you had the answer there, but it appears that the sizeof's are both 8. Could it be that according to ieee754, even though a double is 64-bits, it cannot hold the integral number 1510763846?Trash
This is probably still the answer; only the conversion to float is somewhere else (cannot know where exactly). I remember that on some ARM platform there were dedicated functions in the C runtime for casting from 64-bit integers to floating point. These functions could be broken.Hitt
The C++ standard does allow double and float to have the same size, representation, range, and precision (though they're always distinct types), but it also imposes minimal requirements on the range and precision of type double. I'm fairly sure a 32-bit type can't meet those requirements.Calvert
This is the best reference to the allowed sizes of float and double I found: https://mcmap.net/q/25072/-are-ieee-float-and-double-guaranteed-to-be-the-same-size-on-any-os — but I dont know whats happening, except that there’s strong smell of float. We should look at the assembler dump.Khichabia

© 2022 - 2024 — McMap. All rights reserved.