How is the CLR faster than me when calling Windows API
Asked Answered
A

2

14

I tested different ways of generating a timestamp when I found something surprising (to me).

Calling Windows's GetSystemTimeAsFileTime using P/Invoke is about 3x slower than calling DateTime.UtcNow that internally uses the CLR's wrapper for the same GetSystemTimeAsFileTime.

How can that be?

Here's DateTime.UtcNow's implementation:

public static DateTime UtcNow {
    get {
        long ticks = 0;
        ticks = GetSystemTimeAsFileTime();
        return new DateTime( ((UInt64)(ticks + FileTimeOffset)) | KindUtc);
    }
}

[MethodImplAttribute(MethodImplOptions.InternalCall)] // Implemented by the CLR
internal static extern long GetSystemTimeAsFileTime();

Core CLR's wrapper for GetSystemTimeAsFileTime:

FCIMPL0(INT64, SystemNative::__GetSystemTimeAsFileTime)
{
    FCALL_CONTRACT;

    INT64 timestamp;

    ::GetSystemTimeAsFileTime((FILETIME*)&timestamp);

#if BIGENDIAN
    timestamp = (INT64)(((UINT64)timestamp >> 32) | ((UINT64)timestamp << 32));
#endif

    return timestamp;
}
FCIMPLEND;

My test code utilizing BenchmarkDotNet:

public class Program
{
    static void Main() => BenchmarkRunner.Run<Program>();

    [Benchmark]
    public DateTime UtcNow() => DateTime.UtcNow;

    [Benchmark]
    public long GetSystemTimeAsFileTime()
    {
        long fileTime;
        GetSystemTimeAsFileTime(out fileTime);
        return fileTime;
    }

    [DllImport("kernel32.dll")]
    public static extern void GetSystemTimeAsFileTime(out long systemTimeAsFileTime);
}

And the results:

                  Method |     Median |    StdDev |
------------------------ |----------- |---------- |
 GetSystemTimeAsFileTime | 14.9161 ns | 1.0890 ns |
                  UtcNow |  4.9967 ns | 0.2788 ns |
Afflatus answered 18/6, 2016 at 15:22 Comment(8)
CLR can call it directly. Pinvoke goes through marshalling layer.Caramel
@DavidHeffernan even when the parameters don't need marshalling?Afflatus
@i3arnon: Something has to analyze them to prove that.Ouellette
@BenVoigt where does that layer come in? Can I avoid it somehow?Afflatus
C++/CLI emits assemblies using the internalcall calling convention just like the CLR implementation, which avoids p/invoke overhead by assuming that the callee is aware of .NET memory layout and will take care of things.Ouellette
The one thing you might try is unsafe using a pointer, instead of an out parameter. With a pointer, your code is responsible for performing pinning, and you can outright skip it for a stack variable.Ouellette
@BenVoigt tried it. It had no effect: gist.github.com/i3arnon/fc61ba3ef9553e0e048eb8d14aaa5dc2Afflatus
Microsoft has documented as part of the CoreCLR project what it takes to write unmanaged code that can be directly called from a managed program. Details are very important, it is far too easy to create a "GC hole". The kind of problem that the pinvoke marshaller solves for you, at the cost of some overhead. You have to understand everything that this article says.Singles
A
8

When managed code invokes unmanaged code there's a stack walk making sure the calling code has the UnmanagedCode permission enabling doing that.

That stack walk is done at run-time and has substantial costs in performance.

It's possible to remove the run-time check (there's still a JIT compile-time one) by using the SuppressUnmanagedCodeSecurity attribute:

[SuppressUnmanagedCodeSecurity]
[DllImport("kernel32.dll")]
public static extern void GetSystemTimeAsFileTime(out long systemTimeAsFileTime);

This brings my implementation about half the way towards the CLR's:

                  Method |    Median |    StdDev |
------------------------ |---------- |---------- |
 GetSystemTimeAsFileTime | 9.0569 ns | 0.7950 ns |
                  UtcNow | 5.0191 ns | 0.2682 ns |

Keep in mind though that doing that may be extremely risky security-wise.

Also using unsafe as Ben Voigt suggested brings it halfway again:

                  Method |    Median |    StdDev |
------------------------ |---------- |---------- |
 GetSystemTimeAsFileTime | 6.9114 ns | 0.5432 ns |
                  UtcNow | 5.0226 ns | 0.0906 ns |
Afflatus answered 18/6, 2016 at 19:1 Comment(1)
Thanks, the combination of SuppressUnmanagedCodeSecurity with passing a pointer (unsafe) is a real winner. On my system it makes the call twice as fast as DateTime.UtcNow (2.3 ns vs 5.3 ns) in 64-bit mode, or roughly four times as fast as only suppressing the stack walk (8.9 ns). Strangely, the difference is less pronounced in 32-bit mode where the difference is 3.2 ns vs 4.3 ns (yes, DateTime.UtcNow is faster under WOW64 than in native 64-bit mode). Kudos for coming up with this winning combo!Gonfalonier
O
7

The CLR almost certainly passes a pointer to a local (automatic, stack) variable to receive the result. The stack doesn't get compacted or relocated, so there's no need to pin memory, etc, and when using a native compiler, such things aren't supported anyway so there's no overhead to account for them.

In C# though, the p/invoke declaration is compatible with passing a member of a managed class instance living in the garbage-collected heap. P/invoke has to pin that instance or else risk having the output buffer move during/before the OS function writes to it. Even though you do pass a variable stored on the stack, p/invoke still must test and see whether the pointer is into the garbage collected heap before it can branch around the pinning code, so there's non-zero overhead even for the identical case.

It's possible that you could get better results using

[DllImport("kernel32.dll")]
public unsafe static extern void GetSystemTimeAsFileTime(long* pSystemTimeAsFileTime);

By eliminating the out parameter, p/invoke no longer has to deal with aliasing and heap compaction, that's now completely the responsibility of your code that sets up the pointer.

Ouellette answered 18/6, 2016 at 15:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.