Output of strtoull() loses precision when converted to double and then back to uint64_t
Asked Answered
H

1

8

Consider the following:

#include <iostream>
#include <cstdint>

int main() {
   std::cout << std::hex
      << "0x" << std::strtoull("0xFFFFFFFFFFFFFFFF",0,16) << std::endl
      << "0x" << uint64_t(double(std::strtoull("0xFFFFFFFFFFFFFFFF",0,16))) << std::endl
      << "0x" << uint64_t(double(uint64_t(0xFFFFFFFFFFFFFFFF))) << std::endl;
   return 0;
}

Which prints:

0xffffffffffffffff
0x0
0xffffffffffffffff

The first number is just the result of converting ULLONG_MAX, from a string to a uint64_t, which works as expected.

However, if I cast the result to double and then back to uint64_t, then it prints 0, the second number.

Normally, I would attribute this to the precision inaccuracy of floats, but what further puzzles me, is that if I cast the ULLONG_MAX from uint64_t to double and then back to uint64_t, the result is correct (third number).

Why the discrepancy between the second and the third result?

EDIT (by @Radoslaw Cybulski) For another what-is-going-on-here try this code:

#include <iostream>
#include <cstdint>
using namespace std;

int main() {
    uint64_t z1 = std::strtoull("0xFFFFFFFFFFFFFFFF",0,16);
    uint64_t z2 = 0xFFFFFFFFFFFFFFFFull;
    std::cout << z1 << " " << uint64_t(double(z1)) << "\n";
    std::cout << z2 << " " << uint64_t(double(z2)) << "\n";
    return 0;
}

which happily prints:

18446744073709551615 0
18446744073709551615 18446744073709551615
Handy answered 19/7, 2019 at 13:9 Comment(2)
A clue that this is undefined behavior: In a local test, with g++ version 6.3, the behavior varied based on whether or not I passed optimization flags. When I passed -O1, -O2 or -O3, I match your behavior. When I don't pass an optimization flag (or pass -O0 explicitly), the result of both round trip casts is 0 (checking assembly, only -O0 actually performs the casts for z2; optimization skips them on the basis of the standard stating that any case where the cast would make a difference is undefined behavior, per eerorika's answer).Infract
Clarifying the results of checking the assembly (of Radoslaw's code): At -O1 and higher, z2 effectively never exists; the immediate value of 0xffffffffffffffff is directly loaded into the argument registers immediately before it's printed, w/o ever being stored in a dedicated register or stack location. It's only at -O0 (which avoids optimizations that would interfere with debugging; it tries to preserve the correspondence between lines of code and the associated assembly) that it bothers to make a stack location for z2, load from it on each use, perform the casts on the value, etc.Infract
J
10

The number that is closest to 0xFFFFFFFFFFFFFFFF, and is representable by double (assuming 64 bit IEEE) is 18446744073709551616. You'll find that this is a bigger number than 0xFFFFFFFFFFFFFFFF. As such, the number is outside the representable range of uint64_t.

Of the conversion back to integer, the standard says (quoting latest draft):

[conv.fpint]

A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.


Why the discrepancy between the second and the third result?

Because the behaviour of the program is undefined.

Although it is mostly pointless to analyse reasons for differences in UB because the scope of variation is limitless, my guess at the reason for the discrepancy in this case is that in one case the value is compile time constant, while in the other there is a call to a library function that is invoked at runtime.

Jolley answered 19/7, 2019 at 13:21 Comment(3)
I wonder, why does the conversion aim for the larger closest integer instead of the smaller one?Holston
@StackDanny Probably because the larger double is closer than the smaller. The result will depend on current rounding mode.Jolley
Not going to post another answer just for this, but you might want to mention that, for standard IEEE 754 double precision binary floating point, it only has 53 bits of integer level precision. UINT64_C(1) << 53 is the last contiguous integer value that converts to double and back losslessly; any odd value above that limit will round (and eventually, so will values not divisible by 4, 8, 16, etc. as you rely more and more on the exponent to scale the integer component up).Infract

© 2022 - 2024 — McMap. All rights reserved.