Double.IsNaN test 100 times faster?

Asked 20/6, 2014 at 16:13 Answered 20/6, 2014 at 16:48

I found this in the .NET Source Code: It claims to be 100 times faster than System.Double.IsNaN. Is there a reason to not use this function instead of System.Double.IsNaN?

[StructLayout(LayoutKind.Explicit)]
private struct NanUnion
{
    [FieldOffset(0)] internal double DoubleValue;
    [FieldOffset(0)] internal UInt64 UintValue;
}

// The standard CLR double.IsNaN() function is approximately 100 times slower than our own wrapper,
// so please make sure to use DoubleUtil.IsNaN() in performance sensitive code.
// PS item that tracks the CLR improvement is DevDiv Schedule : 26916.
// IEEE 754 : If the argument is any value in the range 0x7ff0000000000001L through 0x7fffffffffffffffL 
// or in the range 0xfff0000000000001L through 0xffffffffffffffffL, the result will be NaN.         
public static bool IsNaN(double value)
{
    NanUnion t = new NanUnion();
    t.DoubleValue = value;

    UInt64 exp = t.UintValue & 0xfff0000000000000;
    UInt64 man = t.UintValue & 0x000fffffffffffff;

    return (exp == 0x7ff0000000000000 || exp == 0xfff0000000000000) && (man != 0);
}

EDIT: Still according to the .NET Source Code, the code for System.Double.IsNaN is the following:

public unsafe static bool IsNaN(double d)
{
    return (*(UInt64*)(&d) & 0x7FFFFFFFFFFFFFFFL) > 0x7FF0000000000000L;
}

Stortz answered 20/6, 2014 at 16:13 Comment(8)

Well, the comment says its faster, so its probably faster... Have you benchmarked it? It is an interesting question though (especially the "any reason to use the slower one" part. – Elasticity 20/6, 2014 at 16:15

Without knowing anything about testing for NaN or the correctness of this method, I would recommending sticking with the .NET implementation on the assumption that there's a good reason for any performance difference. The .NET developers know what they're doing and have to account for all the edge cases, etc. to produce a robust library. – Billbillabong 20/6, 2014 at 16:16

Well, that class is internal so you would end up copying that source code and having to maintain it for future use. – Kashmir 20/6, 2014 at 16:16

@paqogomez Not really. It's an interesting question. – Cop 20/6, 2014 at 16:20

This seem to use the well known trick with explicit layout. The implementation seem interesting and the question is - what else the default implementation does so that it is 100 times slower? – Wireworm 20/6, 2014 at 16:21

@paqogomez The number of times I read this citation, I'm beginning to hate it... That's not the subject here. – Dibrin 20/6, 2014 at 16:24

I'm skeptical. 100 times for checking a bit pattern. That's like the difference between a normal method invocation and invoking via reflection. Enum.HasFlag isn't even that slow compared to checking a flag and it does a box and type check. – Geffner 20/6, 2014 at 16:25

@Elasticity I admit I should have benchmarked before asking this question. – Stortz 20/6, 2014 at 17:0

It claims to be 100 times faster than System.Double.IsNaN

Yes, that used to be true. You are missing the time-machine to know when this decision was made. Double.IsNaN() didn't used to look like that. From the SSCLI10 source code:

   public static bool IsNaN(double d)
   {
       // Comparisions of a NaN with another number is always false and hence both conditions will be false.
       if (d < 0d || d >= 0d) {
          return false;
       }
       return true;
   }

Which performs very poorly on the FPU in 32-bit code if d is NaN. Just an aspect of the chip design, it is treated as exceptional in the micro-code. The Intel processor manuals say very little about it, other than documenting a processor perf counter that tracks the number of "Floating Point assists" and noting that the micro-code sequencer comes into play for denormals and NaNs, "potentially costing hundreds of cycles". Not otherwise an issue in 64-bit code, it uses SSE2 instructions which don't have this perf hit.

Some code to play with to see this yourself:

using System;
using System.Diagnostics;

class Program {
    static void Main(string[] args) {
        double d = double.NaN;
        for (int test = 0; test < 10; ++test) {
            var sw1 = Stopwatch.StartNew();
            bool result1 = false;
            for (int ix = 0; ix < 1000 * 1000; ++ix) {
                result1 |= double.IsNaN(d);
            }
            sw1.Stop();
            var sw2 = Stopwatch.StartNew();
            bool result2 = false;
            for (int ix = 0; ix < 1000 * 1000; ++ix) {
                result2 |= IsNaN(d);
            }
            sw2.Stop();
            Console.WriteLine("{0} - {1} x {2}%", sw1.Elapsed, sw2.Elapsed, 100 * sw2.ElapsedTicks / sw1.ElapsedTicks, result1, result2);

        }
        Console.ReadLine();
    }
    public static bool IsNaN(double d) {
        // Comparisions of a NaN with another number is always false and hence both conditions will be false.
        if (d < 0d || d >= 0d) {
            return false;
        }
        return true;
    }
}

Which uses the version of Double.IsNaN() that got micro-optimized. Such micro-optimizations are not evil in a framework btw, the great burden of the Microsoft .NET programmers is that they can rarely guess when their code is in the critical path of an application.

Results on my machine when targeting 32-bit code (Haswell mobile core):

00:00:00.0027095 - 00:00:00.2427242 x 8957%
00:00:00.0025248 - 00:00:00.2191291 x 8678%
00:00:00.0024344 - 00:00:00.2209950 x 9077%
00:00:00.0024144 - 00:00:00.2321169 x 9613%
00:00:00.0024126 - 00:00:00.2173313 x 9008%
00:00:00.0025488 - 00:00:00.2237517 x 8778%
00:00:00.0026940 - 00:00:00.2231146 x 8281%
00:00:00.0025052 - 00:00:00.2145660 x 8564%
00:00:00.0025533 - 00:00:00.2200943 x 8619%
00:00:00.0024406 - 00:00:00.2135839 x 8751%

Nambypamby answered 20/6, 2014 at 16:48 Comment(6)

While reading this answer I was thinking, who the hell know this much stuff, and then I saw the poster name. +1 and Thank You for providing us with such wonderful answers – Peace 20/6, 2014 at 19:11

Just measured on Core 2 Duo in C, 64 bit, 2.4 GHz: The simplest method comparing (d != d) gives the correct result in 2.7 nanoseconds. Comparing ! (d < 0 || d >= 0) gives the correct result in 3.5 nanoseconds. The suggested bit pattern test takes 4.7 nanoseconds (all measured over 10 billion iterations). All measurements with optimisation turned off, otherwise all the comparisons are optimised away. – Ambroseambrosi 20/6, 2014 at 22:12

Hmm, "optimization turned off" just doesn't produce meaningful numbers. Note the weirdo usage of result1 and result2 in the snippet I posted, including using them in the Console.WriteLine() call without actually displaying them. Benchmarking is a tricky art :) – Nambypamby 20/6, 2014 at 22:40

We are talking about code that took between 6 and 12 clock cycles per loop. Plus the assumption that a floating point comparison was absurdly slow with NaNs involved. So this test is absolutely meaningful. A Core 2 Duo processor doesn't slow down when you compare NaNs, Clang doesn't optimise a comparison x == x away, and the test (x != x) is the fastest way to check if a number is a NaN. No tricks needed. – Ambroseambrosi 21/6, 2014 at 7:29

@Hans, to your snarky remark: I've practiced the art of benchmarking for 35 years, so there isn't much I don't know about it. – Ambroseambrosi 21/6, 2014 at 7:30

Sigh. Do not compare SSE2 with FPU code. I already noted that the x64 jitter in .NET doesn't have this problem. – Nambypamby 21/6, 2014 at 9:8

Here's a naive benchmark:

public static void Main()
{
    int iterations = 500 * 1000 * 1000;

    double nan = double.NaN;
    double notNan = 42;

    Stopwatch sw = Stopwatch.StartNew();

    bool isNan;
    for (int i = 0; i < iterations; i++)
    {
        isNan = IsNaN(nan);     // true
        isNan = IsNaN(notNan);  // false
    }

    sw.Stop();
    Console.WriteLine("IsNaN: {0}", sw.ElapsedMilliseconds);

    sw = Stopwatch.StartNew();

    for (int i = 0; i < iterations; i++)
    {
        isNan = double.IsNaN(nan);     // true
        isNan = double.IsNaN(notNan);  // false
    }

    sw.Stop();
    Console.WriteLine("double.IsNaN: {0}", sw.ElapsedMilliseconds);

    Console.Read();
}

Obviously they're wrong:

IsNaN: 15012

double.IsNaN: 6243

EDIT + NOTE: I'm sure the timing will change depending on input values, many other factors etc., but claiming that generally speaking this wrapper is 100x faster than the default implementation seems just wrong.

Dibrin answered 20/6, 2014 at 16:29 Comment(4)

This is what I am noticing too. Even after 1 million iterations, double.IsNaN is effectively twice as fast. I wonder if that comment is from .NET 1.1 or something. – Kashmir 20/6, 2014 at 16:32

@Kashmir I updated results with more iterations to reflect the "twice as fast" you noted. Also +1 for the comment about .NET 1.1, double.IsNan was probably not as fast in the past. – Dibrin 20/6, 2014 at 16:37

+1 I did the same thing. My testing (release mode, no debugger) shows that double.IsNan is close to 10 times as fast. – Maquette 20/6, 2014 at 16:40

Comments are not compiled. – Mayest 20/6, 2014 at 17:38

I call shenanigans. The "fast" version has a considerably larger number of ops and even performs more reads from memory, (stack, so in L1 but still slower than registers).

00007FFAC53D3D01  movups      xmmword ptr [rsp+8],xmm0  
00007FFAC53D3D06  sub         rsp,48h  
00007FFAC53D3D0A  mov         qword ptr [rsp+20h],0  
00007FFAC53D3D13  mov         qword ptr [rsp+28h],0  
00007FFAC53D3D1C  mov         qword ptr [rsp+30h],0  
00007FFAC53D3D25  mov         rax,7FFAC5423D40h  
00007FFAC53D3D2F  mov         eax,dword ptr [rax]  
00007FFAC53D3D31  test        eax,eax  
00007FFAC53D3D33  je          00007FFAC53D3D3A  
00007FFAC53D3D35  call        00007FFB24EE39F0  
00007FFAC53D3D3A  mov         r8d,8  
00007FFAC53D3D40  xor         edx,edx  
00007FFAC53D3D42  lea         rcx,[rsp+20h]  
00007FFAC53D3D47  call        00007FFB24A21680  
            t.DoubleValue = value;
00007FFAC53D3D4C  movsd       xmm5,mmword ptr [rsp+50h]  
00007FFAC53D3D52  movsd       mmword ptr [rsp+20h],xmm5  

            UInt64 exp = t.UintValue & 0xfff0000000000000;
00007FFAC53D3D58  mov         rax,qword ptr [rsp+20h]  
00007FFAC53D3D5D  mov         rcx,0FFF0000000000000h  
00007FFAC53D3D67  and         rax,rcx  
00007FFAC53D3D6A  mov         qword ptr [rsp+28h],rax  
            UInt64 man = t.UintValue & 0x000fffffffffffff;
00007FFAC53D3D6F  mov         rax,qword ptr [rsp+20h]  
00007FFAC53D3D74  mov         rcx,0FFFFFFFFFFFFFh  
00007FFAC53D3D7E  and         rax,rcx  
00007FFAC53D3D81  mov         qword ptr [rsp+30h],rax  

            return (exp == 0x7ff0000000000000 || exp == 0xfff0000000000000) && (man != 0);
00007FFAC53D3D86  mov         rax,7FF0000000000000h  
00007FFAC53D3D90  cmp         qword ptr [rsp+28h],rax  
00007FFAC53D3D95  je          00007FFAC53D3DA8  
00007FFAC53D3D97  mov         rax,0FFF0000000000000h  
00007FFAC53D3DA1  cmp         qword ptr [rsp+28h],rax  
00007FFAC53D3DA6  jne         00007FFAC53D3DBD  
00007FFAC53D3DA8  xor         eax,eax  
00007FFAC53D3DAA  cmp         qword ptr [rsp+30h],0  
00007FFAC53D3DB0  setne       al  
00007FFAC53D3DB3  mov         dword ptr [rsp+38h],eax  
00007FFAC53D3DB7  mov         al,byte ptr [rsp+38h]  
00007FFAC53D3DBB  jmp         00007FFAC53D3DC1  
00007FFAC53D3DBD  xor         eax,eax  
00007FFAC53D3DBF  jmp         00007FFAC53D3DC1  
00007FFAC53D3DC1  nop  
00007FFAC53D3DC2  add         rsp,48h  
00007FFAC53D3DC6  ret

Versus the .NET version:

            return (*(UInt64*)(&d) & 0x7FFFFFFFFFFFFFFFL) > 0x7FF0000000000000L;
00007FFAC53D3DE0  movsd       mmword ptr [rsp+8],xmm0  
00007FFAC53D3DE6  sub         rsp,38h  
00007FFAC53D3DEA  mov         rax,7FFAC5423D40h  
00007FFAC53D3DF4  mov         eax,dword ptr [rax]  
00007FFAC53D3DF6  test        eax,eax  
00007FFAC53D3DF8  je          00007FFAC53D3DFF  
00007FFAC53D3DFA  call        00007FFB24EE39F0  
00007FFAC53D3DFF  mov         rdx,qword ptr [rsp+40h]  
00007FFAC53D3E04  mov         rax,7FFFFFFFFFFFFFFFh  
00007FFAC53D3E0E  and         rdx,rax  
00007FFAC53D3E11  xor         ecx,ecx  
00007FFAC53D3E13  mov         rax,7FF0000000000000h  
00007FFAC53D3E1D  cmp         rdx,rax  
00007FFAC53D3E20  seta        cl  
00007FFAC53D3E23  mov         dword ptr [rsp+20h],ecx  
00007FFAC53D3E27  movzx       eax,byte ptr [rsp+20h]  
00007FFAC53D3E2C  jmp         00007FFAC53D3E2E  
00007FFAC53D3E2E  nop  
00007FFAC53D3E2F  add         rsp,38h  
00007FFAC53D3E33  ret

Thymic answered 20/6, 2014 at 16:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags