Invalid floating point operation calling Trunc()

Asked 5/3, 2014 at 21:57 Answered 6/3, 2014 at 9:30

I'm getting a (repeatable) floating point exception when i try to Trunc() a Real value.

e.g.:

Trunc(1470724508.0318);

In reality the actual code is more complex:

 ns: Real;
 v: Int64;

 ns := ((HighPerformanceTickCount*1.0)/g_HighResolutionTimerFrequency) * 1000000000;
 v := Trunc(ns);

But in the end it still boils down to:

Trunc(ARealValue);

Now, i cannot repeat it anywhere else - just at this one spot. Where it fails every time.

It's not voodoo

Fortunately computers are not magic. The Intel CPU performs very specific observable actions. So i should be able to figure out why the floating point operation fails.

Going into the CPU window

v := Trunc(ns)
fld qword ptr [ebp-$10]

This loads the 8-byte floating point value at ebp-$10 into floating point register ST0.

The bytes at memory address [ebp-$10] are:

0018E9D0: 6702098C 41D5EA5E    (as DWords)
0018E9D0: 41D5EA5E6702098C     (as QWords)
0018E9D0:   1470724508.0318    (as Doubles)

The call succeeds, and the floating point register the contains the appropriate value:

enter image description here

Next is the actual call to the RTL Trunc function:

call @TRUNC

Next is the guts of Delphi RTL's Trunc function:

@TRUNC:

sub esp,$0c
wait
fstcw word ptr [esp]       //Store Floating-Point Control Word on the stack
wait
fldcw word ptr [cwChop]    //Load Floating-Point Control Word
fistp qword ptr [esp+$04]  //Converts value in ST0 to signed integer
                           //stores the result in the destination operand
                             //and pops the stack (increments the stack pointer)
wait
fldcw word ptr [esp]       //Load Floating-Point Control Word
pop ecx
pop eax
pop edx
ret

Or i suppose i could have just pasted it from the rtl, rather than transcribing it from the CPU window:

const cwChop : Word = $1F32;

procedure       _TRUNC;
asm
        { ->    FST(0)   Extended argument       }
        { <-    EDX:EAX  Result                  }

        SUB     ESP,12
        FSTCW   [ESP]              //Store foating-control word in ESP
        FWAIT
        FLDCW   cwChop             //Load new control word $1F32
        FISTP   qword ptr [ESP+4]  //Convert ST0 to int, store in ESP+4, and pop the stack
        FWAIT
        FLDCW   [ESP]              //restore the FPCW
        POP     ECX
        POP     EAX
        POP     EDX
end;

The exception happens during the actual fistp operation.

fistp qword ptr [esp+$04]

At the moment of this call, the ST0 register will contains the same floating point value:

enter image description here

Note: The careful observer will note the value in the above screenshot doesn't match the first screenshot. That's because i took it on a different run. I'd rather not have to carefully redo all the constants in the question just to make them consistent - but trust me: it's the same when i reach the fistp instruction as it was after the fld instruction.

Leading up to it:

sub esp,$0c: I watch it push the the stack down by 12 bytes
fstcw word ptr [esp]: i watch it push $027F into the the current stack pointer
fldcw word ptr [cwChop]: i watch the floating point control flags change
fistp qword ptr [esp+$04]: and it's about to write the Int64 into the room it made on the stack

and then it crashes.

What can actually be going on here?

It happens with other values as well, it's not like there's something wrong with this particular floating point value. But i even tried to setup the test-case elsewhere.

Knowing that the 8-byte hex value of the float is: $41D5EA5E6702098C, i tried to contrive the setup:

var
    ns: Real;
    nsOverlay: Int64 absolute ns;
    v: Int64;
begin
   nsOverlay := $41d62866a2f270dc;
   v := Trunc(ns);
end;

Which gives:

nsOverlay := $41d62866a2f270dc;
mov [ebp-$08],$a2f270dc
mov [ebp-$04],$41d62866
v := Trunc(ns)
fld qword ptr [ebp-$08]
call @TRUNC

And at the point of the call to @trunc, the floating point register ST0 contains a value:

enter image description here

But the call does not fail. It only fails, every time in this one section of my code.

What could be possibly happening that is causing the CPU to throw an invalid floating point exception?

What is the value of `cwChop` before it loads the control word?

The value of cwChop looks to be correct before the load control word, $1F32. But after the load, the actual control word is wrong:

enter image description here

Bonus Chatter

The actual function that is failing is something to convert high-performance tick counts into nanoseconds:

function PerformanceTicksToNs(const HighPerformanceTickCount: Int64): Int64; 
//Convert high-performance ticks into nanoseconds
var
    ns: Real;
    v: Int64;
begin
    Result := 0;

    if HighPerformanceTickCount = 0 then
        Exit;

    if g_HighResolutionTimerFrequency = 0 then
        Exit;

    ns := ((HighPerformanceTickCount*1.0)/g_HighResolutionTimerFrequency) * 1000000000;

    v := Trunc(ns);
    Result := v;
end;

I created all the intermeidate temporary variables to try to track down where the failure is.

I even tried to use that as a template to try to reproduce it:

var
    i1, i2: Int64;
    ns: Real;
    v: Int64;
    vOver: Int64 absolute ns;
begin
    i1 := 5060170;
    i2 := 3429541;
    ns := ((i1*1.0)/i2) * 1000000000;
    //vOver := $41d62866a2f270dc;
    v := Trunc(ns);

But it works fine. There's something about when it's called during a DUnit unit test.

Floating Point control word flags

Delphi's standard control word: $1332:

$1332 = 0001 00 11 00 110010
                           0 ;Don't allow invalid numbers
                          1  ;Allow denormals (very small numbers)
                         0   ;Don't allow divide by zero
                        0    ;Don't allow overflow
                       1     ;Allow underflow
                      1      ;Allow inexact precision
                    0        ;reserved exception mask
                   0         ;reserved  
                11           ;Precision Control - 11B (Double Extended Precision - 64 bits)
             00              ;Rounding control - 
           0                 ;Infinity control - 0 (not used)

The Windows API required value: $027F

$027F = 0000 00 10 01 111111
                           1 ;Allow invalid numbers
                          1  ;Allow denormals (very small numbers)
                         1   ;Allow divide by zero
                        1    ;Allow overflow
                       1     ;Allow underflow
                      1      ;Allow inexact precision
                    1        ;reserved exception mask
                   0         ;reserved  
                10           ;Precision Control - 10B (double precision)
             00              ;Rounding control
           0                 ;Infinity control - 0 (not used)

The crChop control word: $1F32

$1F32 = 0001 11 11 00 110010
                           0 ;Don't allow invalid numbers
                          1  ;Allow denormals (very small numbers)
                         0   ;Don't allow divide by zero
                        0    ;Don't allow overflow
                       1     ;Allow underflow
                      1      ;Allow inexact precision
                    0        ;reserved exception mask
                   0         ;unused
                11           ;Precision Control - 11B (Double Extended Precision - 64 bits)
             11              ;Rounding Control
           1                 ;Infinity control - 1 (not used)
        000                ;unused

The CTRL flags after loading $1F32: $1F72

$1F72 = 0001 11 11 01 110010
                           0 ;Don't allow invalid numbers
                          1  ;Allow denormals (very small numbers)
                         0   ;Don't allow divide by zero
                        0    ;Don't allow overflow
                       1     ;Allow underflow
                      1      ;Allow inexact precision
                    1        ;reserved exception mask
                   0         ;unused
                11           ;Precision Control - 11B (Double Extended Precision - 64 bits)
             11              ;Rounding control 
           1                 ;Infinity control - 1 (not used)
        00011                ;unused

All the CPU is doing is turning on a reserved, unused, mask bit.

RaiseLastFloatingPointError()

If you're going to develop programs for Windows, you really need to accept the fact that floating point exceptions should be masked by the CPU, meaning you have to watch for them yourself. Like Win32Check or RaiseLastWin32Error, we'd like a RaiseLastFPError. The best i can come up with is:

procedure RaiseLastFPError();
var
    statWord: Word;
const
    ERROR_InvalidOperation = $01;
//  ERROR_Denormalized = $02;
    ERROR_ZeroDivide = $04;
    ERROR_Overflow = $08;
//  ERROR_Underflow = $10;
//  ERROR_InexactResult = $20;
begin
    {
        Excellent reference of all the floating point instructions.
        (Intel's architecture manuals have no organization whatsoever)
        http://www.plantation-productions.com/Webster/www.artofasm.com/Linux/HTML/RealArithmetica2.html

        Bits 0:5 are exception flags (Mask = $2F)
            0: Invalid Operation
            1: Denormalized - CPU handles correctly without a problem. Do not throw
            2: Zero Divide
            3: Overflow
            4: Underflow - CPU handles as you'd expect. Do not throw.
            5: Precision - Extraordinarily common. CPU does what you'd want. Do not throw
    }
    asm
        fwait                   //Wait for pending operations
        FSTSW statWord    //Store floating point flags in AX.
                                //Waits for pending operations. (Use FNSTSW AX to not wait.)
        fclex                   //clear all exception bits the stack fault bit,
                                //and the busy flag in the FPU status register
    end;

    if (statWord and $0D) <> 0 then
    begin
        //if (statWord and ERROR_InexactResult) <> 0 then raise EInexactResult.Create(SInexactResult)
        //else if (statWord and ERROR_Underflow) <> 0 then raise EUnderflow.Create(SUnderflow)}
        if (statWord and ERROR_Overflow) <> 0 then raise EOverflow.Create(SOverflow)
        else if (statWord and ERROR_ZeroDivide) <> 0 then raise EZeroDivide.Create(SZeroDivide)
        //else if (statWord and ERROR_Denormalized) <> 0 then raise EUnderflow.Create(SUnderflow)
        else if (statWord and ERROR_InvalidOperation) <> 0 then raise EInvalidOp.Create(SInvalidOp);
    end;
end;

A reproducible case!

I found a case, when Delphi's default floating point control word, that was the cause of an invalid floating point exception (although I never saw it before now because it was masked). Now that i'm seeing it, why is it happening! And it's reproducible:

procedure TForm1.Button1Click(Sender: TObject);
var
    d: Real;
    dover: Int64 absolute d;
begin
    d := 1.35715152325557E020;
//  dOver := $441d6db44ff62b68; //1.35715152325557E020
    d := Round(d); //<--floating point exception
    Self.Caption := FloatToStr(d);
end;

You can see that the ST0 register contains a valid floating point value. The floating point control word is $1372. There floating point exception flag are all clear:

enter image description here

And then, as soon as it executes, it's an invalid operation:

enter image description here

IE (Invalid operation) flag is set
ES (Exception) flag is set

I was tempted to ask this as another question, but it would be the exact same question - except this time calling Round().

Goatsbeard answered 5/3, 2014 at 21:57 Comment(8)

A possible culprit seems to be the value of cwChop. Could it have been corrupted by something leading up to the failing location? – Wirer 5/3, 2014 at 22:7

@500-InternalServerError I think i see what you're looking at. It is being instructed to load to correct flags, but afterwards the CTRL register is wrong. – Goatsbeard 5/3, 2014 at 22:20

You might want to take a look at this: wiert.me/2009/05/06/… – Wirer 5/3, 2014 at 22:20

Great question by the way. As usual. The screenshots had all I needed. Don't worry about the discrepancy between 1f32 and 1f72. The CTRL register always does that. It's the unused reserved part of the register. All as expected. – Sensitometer 5/3, 2014 at 22:33

Converting a float to Int64 (Round and Trunc), when the float is larger than maxInt64 will cause a floating point exception. – Bellman 6/3, 2014 at 7:17

"If you're going to develop programs for Windows, you really need to accept the fact that floating point exceptions should be masked by the CPU, meaning you have to watch for them yourself." That's not the conclusion that I draw. My 700kloc Windows program runs with exceptions unmasked. I just makes sure that whenever I call a function that might change the control word, I change it back. Or when I call a function that wants exceptions masked, I mask them before calling, and restore on return. – Sensitometer 6/3, 2014 at 9:25

Sometimes you do not call code that could change the, or depend on the, masked exception flags. Code in your thread, that is not yours, could be invoked without your knowledge. And that code trusts that I have setup the thread correctly. – Goatsbeard 6/3, 2014 at 12:12

@Ian I agree that it's not easy however you slice it. But these issues have to be tackled whatever policy you adopt. You have to take control of the control word one way or another. It's actually possible to run an app with exceptions unmasked and have it working smoothly. I know from experience because I do just that. – Sensitometer 6/3, 2014 at 18:44

The problem occurs elsewhere. When your code enters Trunc the control word is set to $027F which is, IIRC, the default Windows control word. This has all exceptions masked. That's a problem because Delphi's RTL expects exceptions to be unmasked.

And look at the FPU window, sure enough there are errors. Both IE and PE flags are set. It's IE that counts. That's means that earlier in the code sequence there was a masked invalid operation.

Then you call Trunc which modifies the control word to unmask the exceptions. Look at your second FPU window screenshot. IE is 1 but IM is 0. So boom, the earlier exception is raised and you are led to think that it was the fault of Trunc. It was not.

You'll need to trace back up the call stack to find out why the control word is not what it ought to be in a Delphi program. It ought to be $1332. Most likely you are calling into some third party library which modifies the control word and does not restore it. You'll have to locate the culprit and take charge whenever any calls to that function return.

Once you get the control word back under control you'll find the real cause of this exception. Clearly there is an illegal FP operation. Once the control word unmasks the exceptions, the error will be raised at the right point.

Note that there's nothing to worry about the discrepancy between $1372 and $1332, or $1F72 and $1F32. That's just an oddity with the CTRL control word that some of the bytes are reserved and ignore you exhortations to clear them.

Sensitometer answered 5/3, 2014 at 22:21 Comment(9)

Dammit. I'm intentionally doing Set8087CW($027F) at program startup because the embedded TWebBrowser that drives the interface. Someone really needs to define a Windows calling convention and stick to it! – Goatsbeard 5/3, 2014 at 22:55

@Ian Yes, this issue has driven me mad over the years. It's a problem that just keeps on giving. Our code has lots of defense against 3rd party libraries changing the control word. With the web browser control I don't know if it is tractable to feed in 027F when you call, and restore later. There's a fair amount of async isn't there. Makes it tricky. – Sensitometer 5/3, 2014 at 23:1

That control sucks in that regard. Delphi's RTL doesn't help. Did you know that Set8087CW is not thread safe? I have a serious of QC reports on the issue of floating point culminating in a detailed proposal to redesign and fix the runtime. My program fixes the RTL. Can't see Emba ever doing it though. – Sensitometer 5/3, 2014 at 23:2

I mean series of rather than serious of! – Sensitometer 5/3, 2014 at 23:12

I assume that, unlike regular machine instructions (where the state of the flags represents the results of the last instruction), floating point flags are cumulative until reset. If i were programming in "another language" what is the mechanism to Check-Floating-Point-Flags-And-Reset-Them? In other words, is there a RaiseLastFloatingPointError()? For example, I see that the C standard library has _clearfp() – Goatsbeard 6/3, 2014 at 0:1

@Ian - There's ClearExceptions in math.pas. I checked D7 but don't know about D5. It doesn't return the status though, it's a proc. But you already seem to have find a way; call Trunc. :) – Vaporing 6/3, 2014 at 0:39

@Sertac, but calling ClearExceptions with RaisePending parameter in True will do what Ian describes as RaiseLastFloatingPointError. – Ecumenicalism 6/3, 2014 at 0:43

@Tlama ClearExceptions doesn't actually raise any exception. The RaisePending flag means "let any pending operation complete, to the point where it would raise an exception if masks were off". Is there a way to ask the Intel CPU to trigger the standard SIGTRAP; where it signals Windows, who signals me through the standard exception mechanism? Or do i have to be really clunky, and get the flags myself, and inspect them, and raise EZeroDivide, EOverflow, EUnderflow, etc myself? Ideally the CPU would have ThrowMaskedErrorsNowPlease instruction. – Goatsbeard 6/3, 2014 at 0:56

I traced back earlier in the code, using my newly create RaiseLastFPError to find the problem. And the problem code makes no sense, is easily reproducible, and uses the default control word flags. Figure that one out! – Goatsbeard 6/3, 2014 at 3:50

Your latest update essentially asks a different question. It asks about the exception raised by this code:

procedure foo;
var
  d: Real;
  i: Int64;
begin
  d := 1.35715152325557E020;
  i := Round(d);
end;

This code fails because the job of Round() is to round d to the nearest Int64 value. But your value of d is greater than the largest possible value that can be stored in an Int64 and hence the floating point unit traps.

Sensitometer answered 6/3, 2014 at 9:30 Comment(4)

Yeah, that'll happen when I'm sleep deprived. I needed a good whack over the head on that one. – Goatsbeard 6/3, 2014 at 12:6

Easy rep for me though!! ;-) – Sensitometer 6/3, 2014 at 12:7

So, you said you patched the RTL. Did you ever create a SafeRound() or SafeTrunc(), that do not unmask exceptions when altering round the rounding mode, clear the flags, do the op, check the flags afterward, and raise the appropriate exception (or at the very least store the old masks, alter the rounding mode while unmasking, do the op, fwait, catch the exception, put the mask back, and rethrow)? i'm starting to write SafeTrunc and SafeRound, and then ideally i'd have a define to let them take over _TRUNC and _ROUND. But if it's already written, it would be easier for me. – Goatsbeard 6/3, 2014 at 14:32

I run my code with exceptions unmasked. Sounds like you want to take a different approach. Which means that my code won't help. I do have a better _TRUNC than the one in the RTL, but it assumes unmasked exceptions. It's just a better implementation than that in the RTL. My strategy is to continue with unmasked exceptions. – Sensitometer 6/3, 2014 at 15:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

It's not voodoo

What can actually be going on here?

What is the value of cwChop before it loads the control word?

Bonus Chatter

Floating Point control word flags

RaiseLastFloatingPointError()

A reproducible case!

Recommended topics

Hot tags

What is the value of `cwChop` before it loads the control word?