Yes, I repro this. Well, mostly. I actually don't get an output of 0, but some other garbage output. So I can reproduce the invalid behavior, and I have pinpointed the cause.
You can see the code that GCC 5.4.0 generates with the -m64 -mno-sse
flags here on Goldbolt's Compiler Explorer. In particular, these are the instructions we care about:
// double pi = 3.14;
fld QWORD PTR .LC0[rip]
fstp QWORD PTR [rbp-8]
// std::cout << "pi:";
mov esi, OFFSET FLAT:.LC1
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
// std::cout << pi;
sub rsp, 8
push QWORD PTR [rbp-8]
mov rdi, rax
call std::basic_ostream<char, std::char_traits<char> >::operator<<(double)
add rsp, 16
What's happening here? Well, first, we need to understand what the -mno-sse
flag means. This prevents the compiler from generating any code that uses SSE instructions (as well as any later instruction-set extensions). Therefore, this means that all floating-point operations must be done using the legacy x87 FPU. That works fine and is well-supported on 32-bit builds, but it is nonsensical on 64-bit builds. The AMD64 specification requires SSE2 support as a minimum, so it can be assumed that all 64-bit-capable x86 CPUs will support both SSE and SSE2. This assumption has made it into the ABI: all floating-point operations on x86-64 are done using SSE2 instructions, and floating-point values are passed in XMM registers. Therefore, doing floating-point operations but forbidding the compiler from using SSE/SSE2 instructions puts the code-generator in an impossible position and leads to inevitable failure.
How exactly does it fail? Let's walk through the code above. It's unoptimized (since you didn't pass an optimization flag, it defaulted to -O0
), which makes it a little hard to read, but bear with me.
In the first block, it uses x87 FPU instructions to load your double-precision floating-point value (3.14) from memory (it is stored as a constant in the binary) into the register at the top of the x87 FPU stack. Then, it pops that value off the stack and stores it into memory (the program stack). This is totally just busy-work done in unoptimized code, and you can pretty much just ignore it. The upshot here is that your floating-point value is stored in memory at rbp-8
(an offset of 8 bytes from the base pointer).
The next block of instructions can be completely ignored. They just output the string "pi:".
The third block of instructions are supposed to output the floating-point value. First, 8 bytes of space is allocated on the stack. Then, the floating-point value that we had previously stored to memory is pushed onto the stack.
So far, so good. This is how you normally would pass a floating-point parameter to a function—that is, in a 32-bit build, following the 32-bit ABI, where you were using x87 instructions. In a 64-bit build, following the 64-bit ABI, floating-point parameters are supposed to be passed in XMM registers, and this is where the operator<<(double)
function expects to receive its parameter. But, you told the compiler it cannot generate SSE code, so it cannot make use of the XMM registers. Its hands are tied. It cannot properly call the library function, which follows the ABI, because your specific options break the ABI.
It's all downhill from here. The compiler copies the contents of the rax
register into the rdi
register, and then calls the operator<<(double)
function. This function tries to write the floating-point value passed in the XMM0
register to stdout, but that register contains garbage (in your case, it seems to contain 0, but its actual contents are formally undefined), so this garbage is written to stdout, instead of the floating-point value you expected to see.
Now that we understand the problem, what are the solutions?
- If you don't want to use SSE instructions, force a 32-bit binary to be compiled using the
-m32
flag. This combines safely with -mno-sse
.
- If you need a 64-bit binary, then don't pass the
-mno-sse
flag, because this is a violation of the 64-bit ABI, which assumes SSE2 support as a minimum.
(Although I'm ignoring it here, it is technically reasonable to pass the -mno-sse
flag along with the -m64
flag. Indeed, this is explicitly supported by GCC because it is used to compile Linux kernel code, where the XMM registers' state is not persisted between calls. This works only because kernel code does not perform floating-point operations. The -mno-sse
switch is used only to prevent the compiler from using SSE instructions as part of an advanced optimization that has nothing to do with floating-point operations.)