Performance of "direct" virtual call vs. interface call in C#

Asked 29/8, 2011 at 1:14 Answered 29/9, 2013 at 14:18

Solved c#.net performance language-design

This benchmark appears to show that calling a virtual method directly on object reference is faster than calling it on the reference to the interface this object implements.

In other words:

interface IFoo {
    void Bar();
}

class Foo : IFoo {
    public virtual void Bar() {}
}

void Benchmark() {
    Foo f = new Foo();
    IFoo f2 = f;
    f.Bar(); // This is faster.
    f2.Bar();    
}

Coming from the C++ world, I would have expected that both of these calls would be implemented identically (as a simple virtual table lookup) and have the same performance. How does C# implement virtual calls and what is this "extra" work that apparently gets done when calling through an interface?

--- EDIT ---

OK, answers/comments I got so far imply that there is a double-pointer-dereference for virtual call through interface versus just one dereference for virtual call through object.

So could please somebody explain why is that necessary? What is the structure of the virtual table in C#? Is it "flat" (as is typical for C++) or not? What were the design tradeoffs that were made in C# language design that lead to this? I'm not saying this is a "bad" design, I'm simply curious as to why it was necessary.

In a nutshell, I'd like to understand what my tool does under the hood so I can use it more effectively. And I would appreciate if I didn't get any more "you shouldn't know that" or "use another language" types of answers.

--- EDIT 2 ---

Just to make it clear we are not dealing with some compiler of JIT optimization here that removes the dynamic dispatch: I modified the benchmark mentioned in the original question to instantiate one class or the other randomly at run-time. Since the instantiation happens after compilation and after assembly loading/JITing, there is no way to avoid dynamic dispatch in both cases:

interface IFoo {
    void Bar();
}

class Foo : IFoo {
    public virtual void Bar() {
    }
}

class Foo2 : Foo {
    public override void Bar() {
    }
}

class Program {

    static Foo GetFoo() {
        if ((new Random()).Next(2) % 2 == 0)
            return new Foo();
        return new Foo2();
    }

    static void Main(string[] args) {

        var f = GetFoo();
        IFoo f2 = f;

        Console.WriteLine(f.GetType());

        // JIT warm-up
        f.Bar();
        f2.Bar();

        int N = 10000000;
        Stopwatch sw = new Stopwatch();

        sw.Start();
        for (int i = 0; i < N; i++) {
            f.Bar();
        }
        sw.Stop();
        Console.WriteLine("Direct call: {0:F2}", sw.Elapsed.TotalMilliseconds);

        sw.Reset();
        sw.Start();
        for (int i = 0; i < N; i++) {
            f2.Bar();
        }
        sw.Stop();
        Console.WriteLine("Through interface: {0:F2}", sw.Elapsed.TotalMilliseconds);

        // Results:
        // Direct call: 24.19
        // Through interface: 40.18

    }

}

--- EDIT 3 ---

If anyone is interested, here is how my Visual C++ 2010 lays out an instance of a class that multiply-inherits other classes:

Code:

class IA {
public:
    virtual void a() = 0;
};

class IB {
public:
    virtual void b() = 0;
};

class C : public IA, public IB {
public:
    virtual void a() override {
        std::cout << "a" << std::endl;
    }
    virtual void b() override {
        std::cout << "b" << std::endl;
    }
};

Debugger:

c   {...}   C
    IA  {...}   IA
        __vfptr 0x00157754 const C::`vftable'{for `IA'} *
            [0] 0x00151163 C::a(void)   *
    IB  {...}   IB
        __vfptr 0x00157748 const C::`vftable'{for `IB'} *
            [0] 0x0015121c C::b(void)   *

Multiple virtual table pointers are clearly visible, and sizeof(C) == 8 (in 32-bit build).

The...

C c;
std::cout << static_cast<IA*>(&c) << std::endl;
std::cout << static_cast<IB*>(&c) << std::endl;

..prints...

0027F778
0027F77C

...indicating that pointers to different interfaces within the same object actually point to different parts of that object (i.e. they contain different physical addresses).

Talent answered 29/8, 2011 at 1:14 Comment(15)

C++ doesn't necessarily force a virtual lookup. If the dynamic type can be determined at compile time, the correct function can be called directly. – Outlive 29/8, 2011 at 1:18

@Kerrek, Equivalent C++ example would likely be optimized by the C++ compiler (yet both calls would still remain equally "quick"), that much is true. It is also immaterial to the question - I'm much more interested what happens in real-life cases that cannot be optimized-away. – Talent 29/8, 2011 at 1:22

An interface method call requires a double pointer dereference. C# ought perhaps not be your language of choice if you count nanoseconds. C and C++ are languages that are optimized for that. – Gadoid 29/8, 2011 at 1:27

@Hans, the fact that I asked the question does not mean I'm "counting nanoseconds" on any concrete project. Can't I just be curious? – Talent 29/8, 2011 at 1:46

Your question doesn't express that interest well. – Gadoid 29/8, 2011 at 1:49

I think Hans is right; the focus of your question is definitely about small performance gains. C#, perhaps by being a CLR language, employs several layers of abstraction that would be impractical to optimize away. – Fowler 4/10, 2011 at 14:48

@Jeremy ~60% decrease in performance for "simple" calls is something that will be drown-out by other aspects of performance in most situations, I agree. However, I don't agree it will be insignificant in all situations, so I think a discerning coder should be aware of it. – Talent 4/10, 2011 at 15:3

Aware of, certainly. Certain tools are more suited to certain tasks. The abstractions available in most CLR languages are attractive to developers due to the lower learning curve and flexibility/ reusability while not extending the development time. In cases where the performance is the most critical factor, other technologies become more appropriate. – Fowler 4/10, 2011 at 15:15

@Jeremy You said: "In cases where the performance is the most critical factor, other technologies become more appropriate". Not necessarily. Have you ever implemented a suffix tree in both C++ and C#? You'd be amazed how much performance can be squeezed out of C# (in my case, it actually ended-up being slightly faster). – Talent 4/10, 2011 at 17:4

I'll rephrase: In cases where the performance is the most critical factor, other technologies may become more appropriate. Basically I am suggesting to use the right tool for the job, and that there is no such thing as one tool that can do everything the best. – Fowler 4/10, 2011 at 17:8

Good rephrase @Jeremy, I fully agree! – Talent 4/10, 2011 at 18:14

possible duplicate of is it better performance wise to use the concrete type rather than the interface – Mayonnaise 11/4, 2013 at 12:38

Dont forget if your making a virt call to a method in a DLL then the performance profile is very different with C# / Java doing polymorphic inline caching and C++ doing hash lookups. – Sussex 27/6, 2015 at 10:8

@Sussex Ummm... Native DLL functions (is that what you mean?) are not dynamically dispatched when called from managed code, as far as I know. You could, of course, have dynamic dispatch if both caller and callee are native C++, but that was really not the thrust of my question. Also, what do you mean by "polymorphic inline caching" and "hash lookups"? – Talent 29/6, 2015 at 4:22

No C++ to C++ ..DLL virtual calls are much slower than Java /C# ( MS CLR , mono doesnt have polymorphic inline caching) . There is lots of material on polymorphic inline caches. -. – Sussex 1/10, 2015 at 6:11

I think the article Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects will answer your questions. In particular, see the section *Interface Vtable Map and Interface Map-, and the following section on Virtual Dispatch.

It's probably possible for the JIT compiler to figure things out and optimize the code for your simple case. But not in the general case.

IFoo f2 = GetAFoo();

And GetAFoo is defined as returning an IFoo, then the JIT compiler wouldn't be able to optimize the call.

Simpleminded answered 27/9, 2011 at 17:14 Comment(5)

While being a fairly old article (.NET 1.1), I can imagine a lot of it is still relevant today and to may question. Apparently, C# never ever stores more than one (equivalent of) virtual table pointer per object, even when inherited from multiple interfaces. So, the caller cannot simply use its specific "pre-cooked" vtable pointer (as in typical C++) - instead it must go through a process of finding the correct virtual "subtable", which costs some performance. Fascinating read altogether, thank you for the link! – Talent 28/9, 2011 at 22:56

Note the article above is the May 2005 issue of MSDN magazine. The current link to the download of the issue is here: download.microsoft.com/download/3/a/7/… – Cole 15/12, 2015 at 22:17

As addition to the answer here's the link that describes the reasoning behind why there is no multiple inheritance in .Net (imho it is the reason why there is only one virtual table pointer) MSDN blog – Bradway 17/3, 2017 at 11:2

The links are dead. The archive Tom linked to is corrupt. Here's the Internet Archive: Wayback Machine link for right after this answer was posted. Btw, the article's title is "JIT and Run: Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects". – Indulgent 10/10, 2017 at 15:29

You can also downlload the May 2005 issue as a CHM file (must unblock before viewing) from msdn.microsoft.com/magazine/msdn-magazine-issues. – Witha 9/7, 2019 at 19:4

Here is what the disassembly looks like (Hans is correct):

            f.Bar(); // This is faster.
00000062  mov         rax,qword ptr [rsp+20h]
00000067  mov         rax,qword ptr [rax]
0000006a  mov         rcx,qword ptr [rsp+20h]
0000006f  call        qword ptr [rax+60h]
            f2.Bar();
00000072  mov         r11,7FF000400A0h
0000007c  mov         qword ptr [rsp+38h],r11
00000081  mov         rax,qword ptr [rsp+28h]
00000086  cmp         byte ptr [rax],0
00000089  mov         rcx,qword ptr [rsp+28h]
0000008e  mov         r11,qword ptr [rsp+38h]
00000093  mov         rax,qword ptr [rsp+38h]
00000098  call        qword ptr [rax]

Stellastellar answered 29/8, 2011 at 1:54 Comment(4)

Thanks for the answer, but I'm really more interested in a more "high level" answer, or "rationale" for such behavior. – Talent 29/8, 2011 at 2:12

When accessing an object through an interface, the interface function must be "matched up" to the actual object's function. This takes more time and more code. Unless you are writing a compiler, I wouldn't spend a lot of time on this. There are 75 million other things to learn. – Stellastellar 29/8, 2011 at 4:8

The virtual table mechanism in C++ is pretty straightforward and useful to know even if i'm not "writing a compiler". It surprised me that C# does thing differently and I got curious, that's all. BTW, this came-up in context of another question: [Practical advantage of generics vs interfaces in C#][1] [1]: #7225175 – Talent 29/8, 2011 at 9:38

Could you give more explanation on WHY it is this way? Thanks! – Included 4/10, 2011 at 4:13

I tried your test and on my machine, in a particular context, the result is actually the other way around.

I am running Windows 7 x64 and I have created a Visual Studio 2010 Console Application project into which I have copied your code. If a compile the project in Debug mode and with the platform target as x86 the output will be the following:

Direct call: 48.38 Through interface: 42.43

Actually every time when running the application it will provide slightly different results, but the interface calls will always be faster. I assume that since the application is compiled as x86, it will be run by the OS through WoW.

For a complete reference, below are the results for the rest of compilation configuration and target combinations.

Release mode and x86 target
Direct call: 23.02
Through interface: 32.73

Debug mode and x64 target
Direct call: 49.49
Through interface: 56.97

Release mode and x64 target
Direct call: 19.60
Through interface: 26.45

All of the above tests were made with .NET 4.0 as the target platform for the compiler. When switching to 3.5 and repeating the above tests, the calls through the interface were always longer than the direct calls.

So, the above tests rather complicate things since it seems that the behavior you spotted is not always happening.

In the end, with the risk of upsetting you, I would like to add a few thoughts. Many people added comments that the performance differences are quite small and in real world programming you should not care about them and I agree with this point of view. There are two main reasons for it.

The first and the most advertised one is that .NET was build on a higher level in order to enable developers to focus on the higher levels of applications. A database or an external service call is thousands or sometimes millions of times slower than a virtual method call. Having a good high level architecture and focusing on the big performance consumers will always bring better results in modern applications rather than avoiding double-pointer-dereferences.

The second and more obscure one is that the .NET team by building the framework on a higher level has actually introduced a series of abstraction levels which the just in time compiler would be able to use for optimizations on different platforms. The more access they would give to the under layers, the more developers would be able to optimize for a specific platform, but the less the runtime compiler would be able to do for the others. That is the theory at least and that is why things are not as well documented as in C++ regarding this particular matter.

Fabi answered 3/10, 2011 at 22:39 Comment(2)

Thank you for the answer and I actually agree with your "philosophical" points. I fully understand that choosing the right data structures and algorithms and not relying on undocumented behavior if far more important than micro-optimizing; this was just an itch that my programming psyche compelled me to scratch ;) BTW, I apologize for not specifying that all my benchmarks were made in Release configuration (I assumed there isn't much point in benchmarking a Debug build that will never be used in production). In light of that, my results actually agree with yours. – Talent 3/10, 2011 at 22:57

@Branko, sorry for not being able to bring much value to what you are actually searching for. My answer is actually more a collection of observations into which I wanted to include the purpose of the higher level approach in .Net, so that other people reaching this page will also get that view on things. On the other hand, the mixed results based on the compilation mode show just how variable .Net can be related to these low level aspects. – Fabi 3/10, 2011 at 23:15

The general rule is: Classes are fast. Interfaces are slow.

That's one of the reasons for the recommendation "Build hierarchies with classes and use interfaces for intra-hierarchy behavior".

For virtual methods, the difference might be slight (like 10%). But for non-virtual methods and fields the difference is huge. Consider this program.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace InterfaceFieldConsoleApplication
{
    class Program
    {
        public abstract class A
        {
            public int Counter;
        }

        public interface IA
        {
            int Counter { get; set; }
        }

        public class B : A, IA
        {
            public new int Counter { get { return base.Counter; } set { base.Counter = value; } }
        }

        static void Main(string[] args)
        {
            var b = new B();
            A a = b;
            IA ia = b;
            const long LoopCount = (int) (100*10e6);
            var stopWatch = new Stopwatch();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                a.Counter = i;
            stopWatch.Stop();
            Console.WriteLine("a.Counter: {0}", stopWatch.ElapsedMilliseconds);
            stopWatch.Reset();
            stopWatch.Start();
            for (int i = 0; i < LoopCount; i++)
                ia.Counter = i;
            stopWatch.Stop();
            Console.WriteLine("ia.Counter: {0}", stopWatch.ElapsedMilliseconds);
            Console.ReadKey();
        }
    }
}

Output:

a.Counter: 1560
ia.Counter: 4587

Drysalter answered 29/9, 2013 at 14:18 Comment(2)

That's fine, but I know that already (as is plainly obvious from the formulation of the question). What I was interested were the technical reasons for such behavior, an I think Jim Mischel provided an answer. – Talent 30/9, 2013 at 13:15

Just wanted to add material to this subject, which there are not that many discussions about. – Drysalter 30/9, 2013 at 14:8

I think the pure virtual function case can use a simple virtual function table, as any derived class of Foo implementing Bar would just change the virtual function pointer to Bar.

On the other hand, calling an interface function IFoo:Bar couldn't do a lookup at something like IFoo's virtual function table, because every implementation of IFoo doesn't need to necceserely implement other functions nor interfaces that Foo does. So the virtual function table entry position for Bar from another class Fubar: IFoo must not match the virtual function table entry position of Bar in class Foo:IFoo.

Thus a pure virtual function call can rely on the same index of the function pointer inside the virtual function table in every derived class, while the interface call has to look up the this index first.

Concoction answered 4/10, 2011 at 10:40 Comment(3)

You are correct that interfaces have to have their own virtual table entries and this is true for both C++ and C#. However, there are ways to implement interface calls with same efficiency as "direct" calls (see my discussion with Alan and --- EDIT 3 --- in my question). My question was really about why these "more efficient ways" were not used in C# (and I'm not saying they are actually better as an overall architecture, just when it comes to the call itself). – Talent 4/10, 2011 at 10:54

I am not quite sure how to optimise this in general. The actual method call depends -if no obvious shortcut is detected- on the used interface and the current objects class. However, the interface cannot provide a general way to select a individual vtable in any object because there is no 'fixed slot' for a interface method to be implemented. Your C++ example can select one of many vtables of the object because the cast's source and target class are known: the combination 'C' and 'IA' uniquely identify one vtable inside C. For an interface pointer like "IFoo" that is not the case. – Concoction 7/12, 2011 at 20:23

Virtual table pointers in a given object always point to vtables that are specific to that object's implementations of given interfaces. In other words, there is no such thing as "interface" vtable - there is only an "interface as implemented by class" vtable. The interface pointer always points to the "right" part of the object, and therefore the right vtptr - if we performed our casts correctly. Essentially information necessary to "select a individual vtable" is encoded in the physical address contained in the pointer. – Talent 8/12, 2011 at 18:48

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

--- EDIT ---

--- EDIT 2 ---

--- EDIT 3 ---

Recommended topics

Hot tags