Why is Calli Faster Than a Delegate Call?
Asked Answered
H

2

45

I was playing around with Reflection.Emit and found about about the little-used EmitCalli. Intrigued, I wondered if it's any different from a regular method call, so I whipped up the code below:

using System;
using System.Diagnostics;
using System.Reflection.Emit;
using System.Runtime.InteropServices;
using System.Security;

[SuppressUnmanagedCodeSecurity]
static class Program
{
    const long COUNT = 1 << 22;
    static readonly byte[] multiply = IntPtr.Size == sizeof(int) ?
      new byte[] { 0x8B, 0x44, 0x24, 0x04, 0x0F, 0xAF, 0x44, 0x24, 0x08, 0xC3 }
    : new byte[] { 0x0f, 0xaf, 0xca, 0x8b, 0xc1, 0xc3 };

    static void Main()
    {
        var handle = GCHandle.Alloc(multiply, GCHandleType.Pinned);
        try
        {
            //Make the native method executable
            uint old;
            VirtualProtect(handle.AddrOfPinnedObject(),
                (IntPtr)multiply.Length, 0x40, out old);
            var mulDelegate = (BinaryOp)Marshal.GetDelegateForFunctionPointer(
                handle.AddrOfPinnedObject(), typeof(BinaryOp));

            var T = typeof(uint); //To avoid redundant typing

            //Generate the method
            var method = new DynamicMethod("Mul", T,
                new Type[] { T, T }, T.Module);
            var gen = method.GetILGenerator();
            gen.Emit(OpCodes.Ldarg_0);
            gen.Emit(OpCodes.Ldarg_1);
            gen.Emit(OpCodes.Ldc_I8, (long)handle.AddrOfPinnedObject());
            gen.Emit(OpCodes.Conv_I);
            gen.EmitCalli(OpCodes.Calli, CallingConvention.StdCall,
                T, new Type[] { T, T });
            gen.Emit(OpCodes.Ret);

            var mulCalli = (BinaryOp)method.CreateDelegate(typeof(BinaryOp));

            var sw = Stopwatch.StartNew();
            for (int i = 0; i < COUNT; i++) { mulDelegate(2, 3); }
            Console.WriteLine("Delegate: {0:N0}", sw.ElapsedMilliseconds);
            sw.Reset();

            sw.Start();
            for (int i = 0; i < COUNT; i++) { mulCalli(2, 3); }
            Console.WriteLine("Calli:    {0:N0}", sw.ElapsedMilliseconds);
        }
        finally { handle.Free(); }
    }

    delegate uint BinaryOp(uint a, uint b);

    [DllImport("kernel32.dll", SetLastError = true)]
    static extern bool VirtualProtect(
        IntPtr address, IntPtr size, uint protect, out uint oldProtect);
}

I ran the code in x86 mode and x64 mode. The results?

32-bit:

  • Delegate version: 994
  • Calli version: 46

64-bit:

  • Delegate version: 326
  • Calli version: 83

I guess the question's obvious by now... why is there such a huge speed difference?


Update:

I created a 64-bit P/Invoke version as well:

  • Delegate version: 284
  • Calli version: 77
  • P/Invoke version: 31

Apparently, P/Invoke is faster... is this a problem with my benchmarking, or is there something going on I don't understand? (I'm in release mode, by the way.)

Hematite answered 5/5, 2011 at 5:22 Comment(2)
Very interesting question. I too tried on machine and there is a large speed difference. I am also curious to know the exact reasons behind it.Muzz
I'm actually beginning to suspect that my benchmarking might be wrong -- there might be instructions in the middle that I'm not noticing, that are messing up the results. Right now I can't think of much, though...Hematite
I
12

Given your performance numbers, I assume you must be using the 2.0 framework, or something similar? The numbers are much better in 4.0, but the "Marshal.GetDelegate" version is still slower.

The thing is that not all delegates are created equal.

Delegates for managed code functions are essentially just a straight function call (on x86, that's a __fastcall), with the addition of a little "switcheroo" if you're calling a static function (but that's just 3 or 4 instructions on x86).

Delegates created by "Marshal.GetDelegateForFunctionPointer", on the other hand - are a straight function call into a "stub" function, which does a little overhead (marshalling and whatnot) before calling the unmanaged function. In this case there's very little marshalling, and the marshalling for this call appears to be pretty much optimized out in 4.0 (but most likely still goes through the ML interpreter on 2.0) - but even in 4.0, there's a stackWalk demanding unmanaged code permissions that isn't part of your calli delegate.

I've generally found that, short of knowing someone on the .NET dev team, your best bet on figuring out what's going on w/ managed/unmanaged interop is to do a little digging with WinDbg and SOS.

Index answered 13/2, 2012 at 14:11 Comment(0)
C
6

Difficult to answer :) Anyway I will try.

The EmitCalli is faster because it is a raw byte code call. I suspect the SuppressUnmanagedCodeSecurity will also disable some checks, for instance stack overrun/array out of bounds index checks. So the code is not safe and run at full speed.

The delegate version will have some compiled code to check typing, and will also do a de-reference call (because the delegate is like a typed-function pointer).

My two cents!

Coaly answered 5/5, 2011 at 7:52 Comment(5)
Hm... I'm confused a bit: Note that both versions use a delegate, it's just a difference in how the delegate was made. So shouldn't both be the same in that regard.Hematite
Also, SuppressUnmanagedCodeSecurity does not disable type checking safety, and it does a stack walk from a managed-to-unmanaged transition. It's for the unmanaged code privilege; I put it there so as to eliminate unrelated bottlenecks.Hematite
@mehrdad: You are right, but the speed is affected: we read from msdn.microsoft.com/en-us/library/… "This attribute can be applied to methods that want to call into native code without incurring the performance loss of a run-time security check when doing so."Coaly
But if it disables the check for all transitions in my code, then why would it make a difference?Hematite
@mehrdad: Good point. I do not know. As I said, the delegate compiled version should have some more indirect call (for isntance it would require at least one pointer de-referencing). It will be slower, but not so much. The mystery is still here!Coaly

© 2022 - 2024 — McMap. All rights reserved.