Inconsistent Results with exAllArithmeticExceptions in Win32 and Win64

A colleague of mine picked up a discrepancy between Win32 and Win64 code compiled by Delphi in how it handles NaN's. Take the following code as an example. When compiled in 32 bit we get no messages but when compiled with 64 bit we get both comparisons returning true.

program TestNaNs;

{$APPTYPE CONSOLE}

{$R *.res}

uses
  System.SysUtils,
  System.Math;

var
  nanDouble: Double;
  zereDouble: Double;
  nanSingle: Single;
  zeroSingle: Single;
begin
  SetExceptionMask(exAllArithmeticExceptions);
  nanSingle := NaN;
  zeroSingle := 0.0;
  if nanSingle <> zeroSingle then
    WriteLn('nanSingle <> zeroSingle');

  nanDouble := NaN;
  zereDouble := 0.0;
  if nanDouble <> zereDouble then
    WriteLn('nanDouble <> zeroDouble');

  ReadLn;
end.

My understanding of the IEEE standard is that <> should return true but all other operations should return false. So in this case, it looks like the 64 bit version is correct and the 32 bit version is incorrect. The code generated by both is very different with the 64 bit version generating SSE code.

For 32 bit:

TestNaNs.dpr.21: if nanSingle <> zeroSingle then
0041A552 D905E01E4200     fld dword ptr [$00421ee0]
0041A558 D81DE41E4200     fcomp dword ptr [$00421ee4]
0041A55E 9B               wait 
0041A55F DFE0             fstsw ax
0041A561 9E               sahf 
0041A562 7419             jz $0041a57d

and for 64 bit:

TestNaNs.dpr.21: if nanSingle <> zeroSingle then
000000000042764E F3480F5A05C9ED0000 cvtss2sd xmm0,qword ptr [rel $0000edc9]
0000000000427657 F3480F5A0DC4ED0000 cvtss2sd xmm1,qword ptr [rel $0000edc4]
0000000000427660 660F2EC1         ucomisd xmm0,xmm1
0000000000427664 7A02             jp Project63 + $68
0000000000427666 7420             jz Project63 + $88

My question is this. Is this an issue with the Delphi compiler or a caveat with the Intel CPU's?

The IEEE 754 standard defines arithmetic formats, operations, rounding rules, exceptions etc. for floating point computation. The Delphi compiler implements floating point arithmetic on top of the available hardware units. For the 32 bit Windows compiler this is the x87 unit, and for the 64 bit Windows compiler this is the SSE unit. Both of these hardware units conform to the IEEE 754 standard.

The difference that you are observing arises at the language implementation level. Let us look at the two versions in more detail.

32 bit Windows compiler

The comparison statement is compiled to this:

TestNaNs.dpr.19: if nanDouble <> zeroDouble then
0041C4C8 DD05C03E4200     fld qword ptr [$00423ec0]
0041C4CE DC1DC83E4200     fcomp qword ptr [$00423ec8]
0041C4D4 9B               wait 
0041C4D5 DFE0             fstsw ax
0041C4D7 9E               sahf 
0041C4D8 7419             jz $0041c4f3

The Intel software developers manual says that an unordered comparison is indicated by the flags C3, C2 and C0 being set to 1. The full table is here:

Condition       C3  C2  C0
ST(0) > Source  0   0   0
ST(0) < Source  0   0   1
ST(0) = Source  1   0   0
Unordered       1   1   1

When you inspect the FPU under the debugger, you can see that this us the case.

0041C4D5 DFE0             fstsw ax
0041C4D7 9E               sahf 
0041C4D8 7419             jz $0041c4f3

This transfers various bits from of the FPU status register into the CPU flags, see the manual for precise details of which flags go where. The the branch is made if ZF is set. The value of ZF comes from the C3 FPU flag, which, reading from the table above, is set for the unordered case.

In fact, the entire branching code can be expressed in pseudo code as:

jump if C3 = 1

So, looking at the table above, it is clear that if one of the operands is a NaN then any floating point equality comparison evaluates as equals.

64 bit Windows compiler

The comparison statement is compiled to this:

TestNaNs.dpr.19: if nanDouble <> zeroDouble then
0000000000428EB8 F20F100548E50000 movsd xmm0,qword ptr [rel $0000e548]
0000000000428EC0 660F2E0548E50000 ucomisd xmm0,qword ptr [rel $0000e548]
0000000000428EC8 7A02             jp TestNaNs + $5C
0000000000428ECA 7420             jz TestNaNs + $7C

The comparison is performed by the ucomisd instruction. The manual gives this psuedo code:

RESULT ← UnorderedCompare(SRC1[63:0] <> SRC2[63:0]) {
(* Set EFLAGS *)
CASE (RESULT) OF
  GREATER_THAN:   ZF, PF, CF ← 000;
  LESS_THAN:      ZF, PF, CF ← 001;
  EQUAL:          ZF, PF, CF ← 100;
  UNORDERED:      ZF, PF, CF ← 111;
ESAC;
OF, AF, SF ← 0;

Notice that in this instruction, the ZF, PF and CF flags are exactly analagous to the C3, C2 and C0 flags on the x87 unit.

The branching is handled by this code:

0000000000428EC8 7A02             jp TestNaNs + $5C
0000000000428ECA 7420             jz TestNaNs + $7C

Notice that there is first a test of the parity flag PF (the jp instruction), and then the zero flag ZF (the jz instruction). The compiler has therefore emitted code to handle the unordered case (i.e. one of the operands is NaN). This is handled first with the jp. Once that is handled, the compiler then checks the zero flag ZF which (because NaNs have been dealt with) is set if and only if the two operands are equal.

Conclusion

The different behaviour is down to the different compilers taking different choices in how to implement the comparison operators. In both situations the hardware is IEEE 754 compliant, and perfectly capable of comparing NaNs as specified by the standard.

My best guess would be that the decisions for the 32 bit compiler were taken a very long time ago. Some of these decisions are questionable. In my view an equality comparison with a NaN operand should evaluate not equals irrespective of the other operand. The weight of history, felt through a desire to maintain backwards compatibility, means that these questionable decisions have never been addressed.

When the 64 bit compiler was created, more recently, the Embarcadero engineers decided to right some of these mistakes. They presumably felt that the break to a new architecture allowed them the freedom to do so.

In an ideal world, the 32 bit compiler could be configured to behave the same way as the 64 bit compiler, by setting a compiler switch.

32 bit Windows compiler

64 bit Windows compiler

Conclusion

Recommended topics

Hot tags