C++: doubles, precision, virtual machines and GCC
Asked Answered
C

4

21

I have the following piece of code:

#include <cstdio>
int main()
{
   if ((1.0 + 0.1) != (1.0 + 0.1))
      printf("not equal\n");
    else
      printf("equal\n");
    return 0;
}

When compiled with O3 using gcc (4.4,4.5 and 4.6) and run natively (ubuntu 10.10), it prints the expected result of "equal".

However the same code when compiled as described above and run on a virtual machine (ubuntu 10.10, virtualbox image), it outputs "not equal" - this is the case for when the O3 and O2 flags are set however not O1 and below. When compiled with clang (O3 and O2) and run upon the virtual machine I get the correct result.

I understand 1.1 can't be correctly represented using double and I've read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" so please don't point me there, this seems to be some kind of optimisation that GCC does that somehow doesn't seem to work in virtual machines.

Any ideas?

Note: The C++ standard says type promotion in this situations is implementation dependent, could it be that GCC is using a more precise internal representation that when the inequality test is applied holds true - due to the extra precision?

UPDATE1: The following modification of the above piece of code, now results in the correct result. It seems at some point, for whatever reason, GCC turns off floating point control word.

#include <cstdio>
void set_dpfpu() { unsigned int mode = 0x27F; asm ("fldcw %0" : : "m" (*&mode)); 
int main()
{
   set_dpfpu();
   if ((1.0 + 0.1) != (1.0 + 0.1))
      printf("not equal\n");
    else
      printf("equal\n");
    return 0;
}

UPDATE2: For those asking about the const expression nature of the code, I've changed it as follows, and still fails when compiled with GCC. - but i assume the optimizer may be turning the following into a const expression too.

#include <cstdio>
void set_dpfpu() { unsigned int mode = 0x27F; asm ("fldcw %0" : : "m" (*&mode)); 
int main()
{
   //set_dpfpu();  uncomment to make it work.
   double d1 = 1.0;
   double d2 = 1.0;  
   if ((d1 + 0.1) != (d2 + 0.1))
      printf("not equal\n");
    else
      printf("equal\n");
    return 0;
}

UPDATE3 Resolution: Upgrading virtualbox to version 4.1.8r75467 resolved the issue. However the their remains one issue, that is: why was the clang build working.

Carver answered 18/1, 2012 at 19:31 Comment(15)
Take a look at the output of the compiler. What code does it produce?Mendacious
Did you use the same version of GCC in both cases?Henig
Virtualbox emulates a virtual CPU, so you can't depend on it using the same instructions.Natalianatalie
@jalf: yes, and the binaries were built on the vm.Carver
This question is difficult to answer, as it happens in a very specific envyronment. The best for you to understand what is going on is to disassembly your code and check what it is doing.Chromium
@tstenner: Doesn't it just forward the instructions into the processor? VT-x etc.Carver
Never test for equality without using the error correction. Use the epsilon value.Choanocyte
Good question. Is the gcc version on the two machines the same? What if you compile on one machine and then copy the resulting application to the other machine; is there a difference in output then?Hyonhyoscine
@Choanocyte that is not the question.Hyonhyoscine
@Mr Lister: Both machine's gcc versions are the same. I've tried copying across a binary built on the native machine, but I get the same erroneous result.Carver
@Macmade: please read the 3rd paragraph.Carver
@JaredKrumsie It can do so, but it isn't guaranteed to do so.Natalianatalie
"Any ideas?" Well, I'd expect optimizing compiler to convert such calculation into compile-time constant (so the "floating point control keyword" wouldn't matter), and possibly drop second branch of "if". Your output means that doesn't happen. If you created two variables ("double a = 1.0 + 0.1, b = 0.1 + 1.0), then compiler might finally decide to calculate result at compile-time.Priddy
There was almost identical question on SO not long ago, but somehow cannot find it... anyone knows where to look? Anyway ,the point is thaat real FPU usualy uses more bits than 64 (80?), and when data gets transfered to RAM, it is rounded. If value is kept in FPU after last operation it not rounded, so comparison results fails.Blague
That's why you should never use = to compare FP values.Nates
P
10

UPDATE: See this posting How to deal with excess precision in floating-point computations? It address the issues of extended floating point precision. I forgot about the extended precision in x86. I remember a simulation that should have been deterministic, but gave different results on Intel CPUs than on PowePC CPUs. The causes was Intel's extended precision architecture.

This Web page talks about how to throw Intel CPUs into double-precision rounding mode: http://www.network-theory.co.uk/docs/gccintro/gccintro_70.html.


Does virtualbox guarantee that its floating point operations are identical to the the hardware's floating point operations? I could not find a guarantee like that with a quick Google search. I also did not find a promise that vituralbox FP ops conform to IEEE 754.

VMs are emulators that try-- and mostly succeed-- to the emulate a particular instruction set or architecture. They are just emulators, however, and subject to their own implementation quirks or design issues.

If you haven't already, post the question forums.virtualbox.org and see what the community says about it.

Polivy answered 18/1, 2012 at 20:13 Comment(4)
The problem is that it works fine using clang - and clang used the same linker bank-end as gcc.Carver
@JaredKrumsie: I'd still be interested in what the VB community has to say about it.Ratter
@JaredKrumsie: "Same linker" != "same compiler". Different compilers can produce different code for the calculation, even if they both will use same linker. All arithmetics in your example will be located in same .o/.obj file, so "linker" doesn't really matter here. Disassemble your compiled program and see what really happens. Or read compiler's ASM output.Priddy
@Jared. I don't know much about LLVM. A did a quick Google search and it looks like LLVM implements its floating-point math. It would make sense then that LLVM fp ops would have consistent behavior across different CPUs or CPU emulators. So if the code isn't being run through the LLVM, then behavior could still vary by CPU/VM.Polivy
E
5

Yep that is really strange behavior, but it can actually easily be explained:

On x86 floating point registers internally use more precision (eg 80 instead of 64). This means the computation 1.0 + 0.1 will be computed with more precision (and since 1.1 can't be represented exactly in binary at all those extra bits WILL be used) in the registers. Only when storing the result to memory will it be truncated.

What this means is simple: If you compare a value loaded from memory with a value newly computed in the registers you'll get a "non-equal" back, because one value was truncated while the other wasn't. So that has nothing to do with VM/no VM it just depends on the code the compiler generates which can easily fluctuate as we see there.

Add it to the growing list of floating point surprises..

Encourage answered 18/1, 2012 at 20:55 Comment(2)
I didn't propose this answer because I expect 0.1 to be truncated at compile time when the literal is emitted to code. Doing the math at higher precision shouldn't matter much unless the result requires more bits than the inputs. On reflection the addition might have needed one or two extra bits, so I suppose it's possible.Satinwood
@Mark True enough, usually you'd expect that problem when having some operation that does generate new precision (sqrt, transcendental functions,..), but it's the best I can think of. Maybe the extended precision bits aren't cleared when loading something from memory? Or we can load 80bit immediates? No idea. Someone would have to look in the intel manuals I fear.Encourage
B
4

I can confirm the same behaviour of your non-VM code, but since I don't have a VM I haven't tested the VM part.

However, the compiler, both Clang and GCC will evaluate the constant expression at compile time. See the assembly output below (using gcc -O0 test.cpp -S) :

    .file   "test.cpp"
    .section        .rodata
.LC0:
    .string "equal"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $.LC0, %edi
    call    puts
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1"
        .section        .note.GNU-stack,"",@progbits

It looks like you understand assembly, but it's clear that there is only the "equal" string, there is no "not equal". So the comparison is not even done at run time, it just prints "equal".

I would try to code the calculation and comparison using assembly and see if you have the same behavior. If you have different behavior on the VM, then it's the way the VM does the calculation.

UPDATE 1: (Based on the "UPDATE 2" in the original question). Below is the gcc -O0 -S test.cpp output assembly (for 64 bit architecture). In it you can see the movabsq $4607182418800017408, %rax line twice. This will be for the two comparison flags, I haven't verified, but I presume the $4607182418800017408 value is 1.1 in floating point terms. It would be interesting to compile this on the VM, if you get the same result (two similar lines) then the VM will be doing something funny at run-time, otherwise it's a combination of VM and compiler.

main:
.LFB1:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movabsq $4607182418800017408, %rax
        movq    %rax, -16(%rbp)
        movabsq $4607182418800017408, %rax
        movq    %rax, -8(%rbp)
        movsd   -16(%rbp), %xmm1
        movsd   .LC1(%rip), %xmm0
        addsd   %xmm1, %xmm0
        movsd   -8(%rbp), %xmm2
        movsd   .LC1(%rip), %xmm1
            addsd   %xmm2, %xmm1
        ucomisd %xmm1, %xmm0
        jp      .L6
        ucomisd %xmm1, %xmm0
        je      .L7
Beatitude answered 18/1, 2012 at 20:38 Comment(0)
H
2

I see you added another question:

Note: The C++ standard says type promotion in this situations is implementation dependent, could it be that GCC is using a more precise internal representation that when the inequality test is applied holds true - due to the extra precision?

The answer to that one is no. 1.1 is not exactly representable in a binary format, no matter how many bits the format has. You can get close, but not with an infinite number of zeros after the .1.

Or did you mean an entirely new internal format for decimals? No, I refuse to believe that. It wouldn't be very compatible if it did.

Hyonhyoscine answered 18/1, 2012 at 20:26 Comment(2)
If an identical calculation is performed twice, it's quite possible for one result to be stored in memory (i.e. 64 bits) and one to remain in a floating point register (i.e. 80 bits). That would count as two different formats in my book.Satinwood
Yes, but you can't express 1.1 precisely in the FP 80-bit format. That was the question.Hyonhyoscine

© 2022 - 2024 — McMap. All rights reserved.