Different floating point result with optimization enabled - compiler bug?

Asked 22/9, 2011 at 15:44 Answered 15/6, 2018 at 17:21

120

The below code works on Visual Studio 2008 with and without optimization. But it only works on g++ without optimization (O0).

#include <cstdlib>
#include <iostream>
#include <cmath>

double round(double v, double digit)
{
    double pow = std::pow(10.0, digit);
    double t = v * pow;
    //std::cout << "t:" << t << std::endl;
    double r = std::floor(t + 0.5);
    //std::cout << "r:" << r << std::endl;
    return r / pow;
}

int main(int argc, char *argv[])
{
    std::cout << round(4.45, 1) << std::endl;
    std::cout << round(4.55, 1) << std::endl;
}

The output should be:

4.5
4.6

But g++ with optimization (O1 - O3) will output:

4.5
4.5

If I add the volatile keyword before t, it works, so might there be some kind of optimization bug?

Test on g++ 4.1.2, and 4.4.4.

Here is the result on ideone: http://ideone.com/Rz937

And the option I test on g++ is simple:

g++ -O2 round.cpp

The more interesting result, even I turn on /fp:fast option on Visual Studio 2008, the result still is correct.

Further question:

I was wondering, should I always turn on the -ffloat-store option?

Because the g++ version I tested is shipped with CentOS/Red Hat Linux 5 and CentOS/Redhat 6.

I compiled many of my programs under these platforms, and I am worried it will cause unexpected bugs inside my programs. It seems a little difficult to investigate all my C++ code and used libraries whether they have such problems. Any suggestion?

Is anyone interested in why even /fp:fast turned on, Visual Studio 2008 still works? It seems like Visual Studio 2008 is more reliable at this problem than g++?

Cogitate answered 22/9, 2011 at 15:44 Comment(15)

FWIW, I'm getting the correct output with g++ 4.5.0 using MinGW. – Fair 22/9, 2011 at 15:51

I get 4.5 4.6 for all cases. What's your g++ version? I have g++ (Debian 4.3.2-1.1) 4.3.2 – Mantling 22/9, 2011 at 15:51

@Bear: I can't reproduce that with GCC 4.5.3 or 4.6.1, on x86 or x86_64. (Bur I have no idea if this is a bug or just one of the many nasties about floating points.) – Killebrew 22/9, 2011 at 15:52

This works as expected with GCC 4.4.3 and 4.6.1 on x86 with or without -std=c++0x and with any of -O0, -O1, -O2 and -O3. – Slowdown 22/9, 2011 at 15:52

I'm unable to reproduce this with g++ 4.4.3 on 64-bit Ubuntu 10.04. This prints out 4.5 4.6 with and without -O1..-O3. – Revolt 22/9, 2011 at 15:52

FYI: gcc 4.6.1 (20110409) produces correct results in Linux. – Yettayetti 22/9, 2011 at 15:52

g++ 4.6.1, results are correct with, and without optimizations. – Araujo 22/9, 2011 at 15:52

On g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 I get the correct output (g++ -O3 x.cpp, same for -O[012]) – Peccavi 22/9, 2011 at 15:53

I wonder if this has something to do with it: gcc.gnu.org/gcc-4.5/changes.html "GCC has been integrated with the MPC library. This allows GCC to evaluate complex arithmetic at compile time more accurately." – Denims 22/9, 2011 at 15:53

ideone uses 4.3.4 ideone.com/b8VXg – Denims 22/9, 2011 at 15:54

You should keep in my mind that your routine is unlikely to work reliably with all sort of output. In contrast to rounding a double to an integer, this is vulnerable to the fact that not all real numbers can be represented so you should expect to get more bugs like this one. – Brianna 22/9, 2011 at 16:1

To those who can't reproduce the bug: do not uncomment the commented out debug stmts, they affect the result. – Costrel 22/9, 2011 at 16:19

Couldn't reproduce on gcc-4.5.1. – Flyover 22/9, 2011 at 17:44

Amusingly enough, I encountered something like this problem in 2006. Then, I was directed to the following bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323. – Ratel 22/9, 2011 at 19:44

I could reproduce it on x86_64 Linux with the correct flags, see answer below. – Eulaheulalee 15/6, 2018 at 17:22

Intel x86 processors use 80-bit extended precision internally, whereas double is normally 64-bit wide. Different optimization levels affect how often floating point values from CPU get saved into memory and thus rounded from 80-bit precision to 64-bit precision.

Use the -ffloat-store gcc option to get the same floating point results with different optimization levels.

Alternatively, use the long double type, which is normally 80-bit wide on gcc to avoid rounding from 80-bit to 64-bit precision.

man gcc says it all:

   -ffloat-store
       Do not store floating point variables in registers, and inhibit
       other options that might change whether a floating point value is
       taken from a register or memory.

       This option prevents undesirable excess precision on machines such
       as the 68000 where the floating registers (of the 68881) keep more
       precision than a "double" is supposed to have.  Similarly for the
       x86 architecture.  For most programs, the excess precision does
       only good, but a few programs rely on the precise definition of
       IEEE floating point.  Use -ffloat-store for such programs, after
       modifying them to store all pertinent intermediate computations
       into variables.

In x86_64 builds compilers use SSE registers for float and double by default, so that no extended precision is used and this issue doesn't occur.

gcc compiler option -mfpmath controls that.

Flyover answered 22/9, 2011 at 16:2 Comment(7)

I think this is the answer. The constant 4.55 gets converted to 4.54999999999999 which is the closest binary representation in 64 bits; multiply by 10 and round again to 64 bits and you get 45.5. If you skip the rounding step by keeping it in an 80-bit register you end up with 45.4999999999999. – Villada 22/9, 2011 at 16:11

Thanks, I even don't know this option. But I was wondering, should I always turn on -ffloat-store option? Because the g++ version I tested is shipped with CentOS/Redhat 5 and CentOS/Redhat 6. I compiled many my programs under these platforms, I am worry about that will cause unexpected bugs inside my programs. – Cogitate 22/9, 2011 at 16:16

@Mark: But how to explain if you uncomment the debug statement: std::cout << "t:" << t << std::endl; the output will be correct??? – Cogitate 22/9, 2011 at 16:29

@Bear, the debug statement probably causes the value to be flushed from a register into memory. – Villada 22/9, 2011 at 16:34

@Bear, normally your application should benefit from extended precision, unless it operates on extremely tiny or huge values when a 64-bit float is expected to under- or overflow and produce inf. There is no good rule of thumb, unit tests can give you a definite answer. – Flyover 22/9, 2011 at 17:52

@bear As a general rule if you need results that are perfectly predictable and/or exactly what a human would get doing the sums on paper then you should avoid floating point. -ffloat-store removes one source of unpredictability but it's not a magic bullet. – Canonicals 25/11, 2015 at 18:38

@Canonicals That rule does not help if you have to use floating point numbers. – Flyover 25/11, 2015 at 18:43

Output should be: 4.5 4.6 That's what the output would be if you had infinite precision, or if you were working with a device that used a decimal-based rather than binary-based floating point representation. But, you aren't. Most computers use the binary IEEE floating point standard.

As Maxim Yegorushkin already noted in his answer, part of the problem is that internally your computer is using an 80 bit floating point representation. This is just part of the problem, though. The basis of the problem is that any number of the form n.nn5 does not have an exact binary floating representation. Those corner cases are always inexact numbers.

If you really want your rounding to be able to reliably round these corner cases, you need a rounding algorithm that addresses the fact that n.n5, n.nn5, or n.nnn5, etc. (but not n.5) is always inexact. Find the corner case that determines whether some input value rounds up or down and return the rounded-up or rounded-down value based on a comparison to this corner case. And you do need to take care that a optimizing compiler will not put that found corner case in an extended precision register.

See How does Excel successfully Rounds Floating numbers even though they are imprecise? for such an algorithm.

Or you can just live with the fact that the corner cases will sometimes round erroneously.

Currant answered 22/9, 2011 at 19:0 Comment(0)

Different compilers have different optimization settings. Some of those faster optimization settings do not maintain strict floating-point rules according to IEEE 754. Visual Studio has a specific setting, /fp:strict, /fp:precise, /fp:fast, where /fp:fast violates the standard on what can be done. You might find that this flag is what controls the optimization in such settings. You may also find a similar setting in GCC which changes the behaviour.

If this is the case then the only thing that's different between the compilers is that GCC would look for the fastest floating point behaviour by default on higher optimisations, whereas Visual Studio does not change the floating point behaviour with higher optimization levels. Thus it might not necessarily be an actual bug, but intended behaviour of an option you didn't know you were turning on.

Axinomancy answered 22/9, 2011 at 15:51 Comment(4)

There's a -ffast-math switch for GCC that, and it's not turned on by any of the -O optimisation levels since quote: "it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions." – Killebrew 22/9, 2011 at 15:55

@Mat: I've tried -ffast-math and a few other things on my g++ 4.4.3 and I'm still unable to reproduce the problem. – Revolt 22/9, 2011 at 15:56

Nice: with -ffast-math I do get 4.5 in both cases for optimization levels greater than 0. – Slowdown 22/9, 2011 at 15:57

(Correction: I get 4.5 with -O1 and -O2, but not with -O0 and -O3 in GCC 4.4.3, but with -O1,2,3 in GCC 4.6.1.) – Slowdown 22/9, 2011 at 16:13

To those who can't reproduce the bug: do not uncomment the commented out debug stmts, they affect the result.

This implies that the problem is related to the debug statements. And it looks like there's a rounding error caused by loading the values into registers during the output statements, which is why others found that you can fix this with -ffloat-store

Further question:

I was wondering, should I always turn on -ffloat-store option?

To be flippant, there must be a reason that some programmers don't turn on -ffloat-store, otherwise the option wouldn't exist (likewise, there must be a reason that some programmers do turn on -ffloat-store). I wouldn't recommend always turning it on or always turning it off. Turning it on prevents some optimizations, but turning it off allows for the kind of behavior you're getting.

But, generally, there is some mismatch between binary floating point numbers (like the computer uses) and decimal floating point numbers (that people are familiar with), and that mismatch can cause similar behavior to what your getting (to be clear, the behavior you're getting is not caused by this mismatch, but similar behavior can be). The thing is, since you already have some vagueness when dealing with floating point, I can't say that -ffloat-store makes it any better or any worse.

Instead, you may want to look into other solutions to the problem you're trying to solve (unfortunately, Koenig doesn't point to the actual paper, and I can't really find an obvious "canonical" place for it, so I'll have to send you to Google).

If you're not rounding for output purposes, I would probably look at std::modf() (in cmath) and std::numeric_limits<double>::epsilon() (in limits). Thinking over the original round() function, I believe it would be cleaner to replace the call to std::floor(d + .5) with a call to this function:

// this still has the same problems as the original rounding function
int round_up(double d)
{
    // return value will be coerced to int, and truncated as expected
    // you can then assign the int to a double, if desired
    return d + 0.5;
}

I think that suggests the following improvement:

// this won't work for negative d ...
// this may still round some numbers up when they should be rounded down
int round_up(double d)
{
    double floor;
    d = std::modf(d, &floor);
    return floor + (d + .5 + std::numeric_limits<double>::epsilon());
}

A simple note: std::numeric_limits<T>::epsilon() is defined as "the smallest number added to 1 that creates a number not equal to 1." You usually need to use a relative epsilon (i.e., scale epsilon somehow to account for the fact that you're working with numbers other than "1"). The sum of d, .5 and std::numeric_limits<double>::epsilon() should be near 1, so grouping that addition means that std::numeric_limits<double>::epsilon() will be about the right size for what we're doing. If anything, std::numeric_limits<double>::epsilon() will be too large (when the sum of all three is less than one) and may cause us to round some numbers up when we shouldn't.

Nowadays, you should consider std::nearbyint().

Officialdom answered 22/9, 2011 at 18:15 Comment(2)

A "relative epsilon" is called 1 ulp (1 unit in the last place). x - nextafter(x, INFINITY) is related to 1 ulp for x (but don't use that; I'm sure there are corner cases and I just made that up). The cppreference example for epsilon() has an example of scaling it to get a ULP-based relative error. – Joyajoyan 1/11, 2016 at 13:50

BTW, the 2016 answer to -ffloat-store is: don't use x87 in the first place. Use SSE2 math (64-bit binaries, or -mfpmath=sse -msse2 for making crusty old 32-bit binaries), because SSE/SSE2 has temporaries with no extra precision. double and float vars in XMM registers are really in IEEE 64-bit or 32-bit format. (Unlike x87, where the registers are always 80-bit, and storing to memory rounds to 32 or 64 bit.) – Joyajoyan 1/11, 2016 at 13:54

The accepted answer is correct if you are compiling to an x86 target that doesn't include SSE2. All modern x86 processors support SSE2, so if you can take advantage of it, you should:

-mfpmath=sse -msse2 -ffp-contract=off

Let's break this down.

-mfpmath=sse -msse2. This performs rounding by using SSE2 registers, which is much faster than storing every intermediate result to memory. Note that this is already the default on GCC for x86-64. From the GCC wiki:

On more modern x86 processors that support SSE2, specifying the compiler options -mfpmath=sse -msse2 ensures all float and double operations are performed in SSE registers and correctly rounded. These options do not affect the ABI and should therefore be used whenever possible for predictable numerical results.

-ffp-contract=off. Controlling rounding isn't enough for an exact match, however. FMA (fused multiply-add) instructions can change the rounding behavior versus its non-fused counterparts, so we need to disable it. This is the default on Clang, not GCC. As explained by this answer:

An FMA has only one rounding (it effectively keeps infinite precision for the internal temporary multiply result), while an ADD + MUL has two.

By disabling FMA, we get results that exactly match on debug and release, at the cost of some performance (and accuracy). We can still take advantage of other performance benefits of SSE and AVX.

Criss answered 14/3, 2018 at 18:21 Comment(1)

+1 for mentioning the FMA instructions as a potential source of trouble. Also, SSE instructions can have a bug on some platforms, so prefer -msse2 over -msse whenever possible. – Dinnie 3/12, 2021 at 10:50

I digged more into this problem and I can bring more precisions. First, the exact representations of 4.45 and 4.55 according to gcc on x84_64 are the following (with libquadmath to print the last precision):

float 32:   4.44999980926513671875
double 64:  4.45000000000000017763568394002504646778106689453125
doublex 80: 4.449999999999999999826527652402319290558807551860809326171875
quad 128:   4.45000000000000000000000000000000015407439555097886824447823540679418548304813185723105561919510364532470703125

float 32:   4.55000019073486328125
double 64:  4.54999999999999982236431605997495353221893310546875
doublex 80: 4.550000000000000000173472347597680709441192448139190673828125
quad 128:   4.54999999999999999999999999999999984592560444902113175552176459320581451695186814276894438080489635467529296875

As Maxim said above, the problem is due to the 80 bits size of the FPU registers.

But why is the problem never occuring on Windows? on IA-32, the x87 FPU was configured to use an internal precision for the mantissa of 53 bits (equivalent to a total size of 64 bits: double). For Linux and Mac OS, the default precision of 64 bits was used (equivalent to a total size of 80 bits: long double). So the problem should be possible, or not, on these different platforms by changing the control word of the FPU (assuming the sequence of instructions would trigger the bug). The issue was reported to gcc as bug 323 (read at least the comment 92! ).

To show the mantissa precision on Windows, you can compile this in 32 bits with VC++:

#include "stdafx.h"
#include <stdio.h>  
#include <float.h>  

int main(void)
{
    char t[] = { 64, 53, 24, -1 };
    unsigned int cw = _control87(0, 0);
    printf("mantissa is %d bits\n", t[(cw >> 16) & 3]);
}

and on Linux/Cygwin:

#include <stdio.h>

int main(int argc, char **argv)
{
    char t[] = { 24, -1, 53, 64 };
    unsigned int cw = 0;
    __asm__ __volatile__ ("fnstcw %0" : "=m" (*&cw));
    printf("mantissa is %d bits\n", t[(cw >> 8) & 3]);
}

Note that with gcc you can set the FPU precision with -mpc32/64/80, though it is ignored in Cygwin. But keep in mind that it will modify the size of the mantissa, but not the exponent one, letting the door open to other kinds of different behavior.

On x86_64 architecture, SSE is used as said by tmandry, so the problem will not occur unless you force the old x87 FPU for FP computing with -mfpmath=387, or unless you compile in 32 bits mode with -m32 (you will need multilib package). I could reproduce the problem on Linux with different combinations of flags and versions of gcc:

g++-5 -m32 floating.cpp -O1
g++-8 -mfpmath=387 floating.cpp -O1

I tried a few combinations on Windows or Cygwin with VC++/gcc/tcc but the bug never showed up. I suppose the sequence of instruction generated is not the same.

Finally, note that an exotic way to prevent this problem with 4.45 or 4.55 would be to use _Decimal32/64/128, but support is really scarce... I spent a lot of time just to be able to do a printf with libdfp !

Eulaheulalee answered 15/6, 2018 at 17:21 Comment(0)

-1

Personally, I have hit the same problem going the other way - from gcc to VS. In most instances I think it is better to avoid optimisation. The only time it is worthwhile is when you're dealing with numerical methods involving large arrays of floating point data. Even after disassembling I'm often underwhelmed by the compilers choices. Very often it's just easier to use compiler intrinsics or just write the assembly yourself.

Assess answered 1/11, 2016 at 0:18 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags