What is the performance cost of having a virtual method in a C++ class?
Asked Answered
P

9

131

Having at least one virtual method in a C++ class (or any of its parent classes) means that the class will have a virtual table, and every instance will have a virtual pointer.

So the memory cost is quite clear. The most important is the memory cost on the instances (especially if the instances are small, for example if they are just meant to contain an integer: in this case having a virtual pointer in every instance might double the size of the instances. As for the memory space used up by the virtual tables, I guess it is usually negligible compared to the space used up by the actual method code.

This brings me to my question: is there a measurable performance cost (i.e. speed impact) for making a method virtual? There will be a lookup in the virtual table at runtime, upon every method call, so if there are very frequent calls to this method, and if this method is very short, then there might be a measurable performance hit? I guess it depends on the platform, but has anyone run some benchmarks?

The reason I am asking is that I came across a bug that happened to be due to a programmer forgetting to define a method virtual. This is not the first time I see this kind of mistake. And I thought: why do we add the virtual keyword when needed instead of removing the virtual keyword when we are absolutely sure that it is not needed? If the performance cost is low, I think I will simply recommend the following in my team: simply make every method virtual by default, including the destructor, in every class, and only remove it when you need to. Does that sound crazy to you?

Pernick answered 20/3, 2009 at 19:30 Comment(11)
See also the Stack Overflow question: AI Applications in C++: How costly are virtual functions? What are the possible optimizations?Melainemelamed
Comparing virtual to non virtual calls is not menaingfull. They provide different functionality. If you want to compare virtual function calls against the C equivelent you need to add the cost of the code that implements the equivalent feature of the virtual function.Significs
Which is either a switch statement or a big if statement. If you were clever you could re-implement using a function pointer table but the probabilities of getting it wrong are much higher.Significs
See #156757Godmother
The question is about function calls that don't need to be virtual, so the comparison is meaningful.Sleek
@Mark Ransom: yes, exactly, thx. I am roughly saying "what if we defined ALL functions as virtual, even those that do not really need to be, and later remove the virtual keyword when it is 100% sure that it is not needed". By doing so, Would we avoid a bunch of bugs and lose only negligible perf?Pernick
If you're using Visual C++, consider using "override" which is a non-standard extension: msdn.microsoft.com/en-us/library/z8ew2153.aspx. It's very good for detecting these kinds of bugs at compile time.Francoisefrancolin
possible duplicate of Performance penalty for working with interfaces in C++?Saxtuba
Making everything virtual by default until someone justifies why it can/should be non-virtual is an abominable policy, yes.Mestizo
Related: in my answer to #46580250, I showed a case where the cost of looping over a vector of pointers calling a virtual method is maybe 24x slower than looping over a vector of objects directly, where the compiler can auto-vectorize after inlining the non-virtual function call. So if you have a choice between keeping multiple containers for different kinds of objects vs. keeping an array of pointers to mixed objects, letting the compiler inline small functions is very good.Passacaglia
@Jon: Old question, but override is the solution to OPs underlying problem imo. It's in the StdLib by now: en.cppreference.com/w/cpp/language/overrideMarxismleninism
S
132

I ran some timings on a 3ghz in-order PowerPC processor. On that architecture, a virtual function call costs 7 nanoseconds longer than a direct (non-virtual) function call.

So, not really worth worrying about the cost unless the function is something like a trivial Get()/Set() accessor, in which anything other than inline is kind of wasteful. A 7ns overhead on a function that inlines to 0.5ns is severe; a 7ns overhead on a function that takes 500ms to execute is meaningless.

The big cost of virtual functions isn't really the lookup of a function pointer in the vtable (that's usually just a single cycle), but that the indirect jump usually cannot be branch-predicted. This can cause a large pipeline bubble as the processor cannot fetch any instructions until the indirect jump (the call through the function pointer) has retired and a new instruction pointer computed. So, the cost of a virtual function call is much bigger than it might seem from looking at the assembly... but still only 7 nanoseconds.

Edit: Andrew, Not Sure, and others also raise the very good point that a virtual function call may cause an instruction cache miss: if you jump to a code address that is not in cache then the whole program comes to a dead halt while the instructions are fetched from main memory. This is always a significant stall: on Xenon, about 650 cycles (by my tests).

However this isn't a problem specific to virtual functions because even a direct function call will cause a miss if you jump to instructions that aren't in cache. What matters is whether the function has been run before recently (making it more likely to be in cache), and whether your architecture can predict static (not virtual) branches and fetch those instructions into cache ahead of time. My PPC does not, but maybe Intel's most recent hardware does.

My timings control for the influence of icache misses on execution (deliberately, since I was trying to examine the CPU pipeline in isolation), so they discount that cost.

Sethsethi answered 20/3, 2009 at 19:43 Comment(6)
The cost in cycles is roughly equal to the number of pipeline stages between fetch and the end of the branch-retire. It's not an insignificant cost, and it can add up, but unless you're trying to write a tight high-performance loop there are probably bigger perf fish for you to fry.Sethsethi
7 nano seconds longer than what. If a normal call is 1 nano second that is dignificant if a normal call is 70 nano seconds then it is not.Significs
If you look at the timings, I found that for a function that cost 0.66ns inline, the differential overhead of a direct function call was 4.8ns and a virtual function 12.3ns (compared to the inline). You make the good point that if the function itself cost a millisecond, then 7 ns means nothing.Sethsethi
Your timings and test overlook the cost of cache misses that virtual functions will incur in large programs, and that is very misleading. If memory serves, a Xenon processor stalls for about 900 cycles on a cache miss.Nannie
More like 600 cycles, but it's a good point. I left it out of the timings because I was interested in just the overhead due to the pipeline bubble and prolog/epilog. The icache miss happens just as easily for a direct function call (Xenon has no icache branch predictor).Sethsethi
Minor detail, but regarding "However this isn't a problem specific to..." it's a tad worse for virtual dispatch as there's an extra page (or two if it happens to fall across a page boundary) that has to be in cache - for the class's Virtual Dispatch Table.Naseberry
R
22

There is definitely measurable overhead when calling a virtual function - the call must use the vtable to resolve the address of the function for that type of object. The extra instructions are the least of your worries. Not only do vtables prevent many potential compiler optimizations (since the type is polymorphic the compiler) they can also thrash your I-Cache.

Of course whether these penalties are significant or not depends on your application, how often those code paths are executed, and your inheritance patterns.

In my opinion though, having everything as virtual by default is a blanket solution to a problem you could solve in other ways.

Perhaps you could look at how classes are designed/documented/written. Generally the header for a class should make quite clear which functions can be overridden by derived classes and how they are called. Having programmers write this documentation is helpful in ensuring they are marked correctly as virtual.

I would also say that declaring every function as virtual could lead to more bugs than just forgetting to mark something as virtual. If all functions are virtual everything can be replaced by base classes - public, protected, private - everything becomes fair game. By accident or intention subclasses could then change the behavior of functions that then cause problems when used in the base implementation.

Rousseau answered 20/3, 2009 at 19:44 Comment(8)
The biggest lost optimization is inlining, especially if the virtual function is often small or empty.Calia
@Andrew: interesting point of view. I somewhat disagree with your last paragraph, though: if a base class has a function save that relies on a specific implementation of a function write in the base class, then it seems to me that either save is poorly coded, or write should be private.Pernick
Just because write is private doesn't prevent it from being overridden. This is another argument for not making things virtual by default. In any case I was thinking of the opposite - a generic and well written implementation is replaced by something that has specific and non-compatible behavior.Rousseau
Voted up on the caching - on any large object-oriented code base, if you're not following code-locality performance practices, it is very easy for your virtual calls to cause cache misses and cause a stall.Nannie
And an icache stall can be really serious: 600 cycles in my tests.Sethsethi
@Zan - a virtual function can still be inlined, if the object type is known at compile time. I.e. object.method() instead of pobject->method(), and object is not a reference. But I agree that probably doesn't happen often.Sleek
The comments in this thread are just as interesting as the answers themselves: thanks everyone! :-)Pernick
All methods are virtual in Java, that's where I got the idea. But they do have two protections against bugs that could come up if a derived class redefines a method that the base class relies on: 1) the final keyword for methods, and 2) private methods are automatically final.Pernick
S
12

It depends. :) (Had you expected anything else?)

Once a class gets a virtual function, it can no longer be a POD datatype, (it may not have been one before either, in which case this won't make a difference) and that makes a whole range of optimizations impossible.

std::copy() on plain POD types can resort to a simple memcpy routine, but non-POD types have to be handled more carefully.

Construction becomes a lot slower because the vtable has to be initialized. In the worst case, the difference in performance between POD and non-POD datatypes can be significant.

In the worst case, you may see 5x slower execution (that number is taken from a university project I did recently to reimplement a few standard library classes. Our container took roughly 5x as long to construct as soon as the data type it stored got a vtable)

Of course, in most cases, you're unlikely to see any measurable performance difference, this is simply to point out that in some border cases, it can be costly.

However, performance shouldn't be your primary consideration here. Making everything virtual is not a perfect solution for other reasons.

Allowing everything to be overridden in derived classes makes it much harder to maintain class invariants. How does a class guarantee that it stays in a consistent state when any one of its methods could be redefined at any time?

Making everything virtual may eliminate a few potential bugs, but it also introduces new ones.

Shapely answered 21/3, 2009 at 1:44 Comment(0)
F
8

If you need the functionality of virtual dispatch, you have to pay the price. The advantage of C++ is that you can use a very efficient implementation of virtual dispatch provided by the compiler, rather than a possibly inefficient version you implement yourself.

However, lumbering yourself with the overhead if you don't needx it is possibly going a bit too far. And most classesare not designed to be inherited from - to create a good base class requires more than making its functions virtual.

Finalist answered 20/3, 2009 at 19:34 Comment(1)
Good answer but, IMO, not emphatic enough in the 2nd half: lumbering yourself with the overhead if you don't need it is, quite frankly, nuts - especially when using this language whose mantra is "don't pay for what you don't use." Making everything virtual by default until someone justifies why it can/should be non-virtual is an abominable policy.Mestizo
N
7

Virtual dispatch is an order of magnitude slower than some alternatives - not due to indirection so much as the prevention of inlining. Below, I illustrate that by contrasting virtual dispatch with an implementation embedding a "type(-identifying) number" in the objects and using a switch statement to select the type-specific code. This avoids function call overhead completely - just doing a local jump. There is a potential cost to maintainability, recompilation dependencies etc through the forced localisation (in the switch) of the type-specific functionality.


IMPLEMENTATION

#include <iostream>
#include <vector>

// virtual dispatch model...

struct Base
{
    virtual int f() const { return 1; }
};

struct Derived : Base
{
    virtual int f() const { return 2; }
};

// alternative: member variable encodes runtime type...

struct Type
{
    Type(int type) : type_(type) { }
    int type_;
};

struct A : Type
{
    A() : Type(1) { }
    int f() const { return 1; }
};

struct B : Type
{
    B() : Type(2) { }
    int f() const { return 2; }
};

struct Timer
{
    Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
    struct timespec from;
    double elapsed() const
    {
        struct timespec to;
        clock_gettime(CLOCK_MONOTONIC, &to);
        return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
    }
};

int main(int argc)
{
  for (int j = 0; j < 3; ++j)
  {
    typedef std::vector<Base*> V;
    V v;

    for (int i = 0; i < 1000; ++i)
        v.push_back(i % 2 ? new Base : (Base*)new Derived);

    int total = 0;

    Timer tv;

    for (int i = 0; i < 100000; ++i)
        for (V::const_iterator i = v.begin(); i != v.end(); ++i)
            total += (*i)->f();

    double tve = tv.elapsed();

    std::cout << "virtual dispatch: " << total << ' ' << tve << '\n';

    // ----------------------------

    typedef std::vector<Type*> W;
    W w;

    for (int i = 0; i < 1000; ++i)
        w.push_back(i % 2 ? (Type*)new A : (Type*)new B);

    total = 0;

    Timer tw;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
        {
            if ((*i)->type_ == 1)
                total += ((A*)(*i))->f();
            else
                total += ((B*)(*i))->f();
        }

    double twe = tw.elapsed();

    std::cout << "switched: " << total << ' ' << twe << '\n';

    // ----------------------------

    total = 0;

    Timer tw2;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
            total += (*i)->type_;

    double tw2e = tw2.elapsed();

    std::cout << "overheads: " << total << ' ' << tw2e << '\n';
  }
}

PERFORMANCE RESULTS

On my Linux system:

~/dev  g++ -O2 -o vdt vdt.cc -lrt
~/dev  ./vdt                     
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726

This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2 times as fast. Of course, that's specific to the exact system tested / compiler flags & version etc., but generally indicative.


COMMENTS RE VIRTUAL DISPATCH

It must be said though that virtual function call overheads are something that's rarely significant, and then only for oft-called trivial functions (like getters and setters). Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost. People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives. The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).

Naseberry answered 26/1, 2011 at 7:27 Comment(2)
I asked a question regarding your code because I have some "strange" results using g++ / clang and -lrt. I thought it was worth mentioning here for future readers.Cogent
@Holt: good question given the mystifying results! I'll give it a closer look in the few days if I get half a chance. Cheers.Naseberry
A
4

The extra cost is virtually nothing in most scenarios. (pardon the pun). ejac has already posted sensible relative measures.

The biggest thing you give up is possible optimizations due to inlining. They can be especially good if the function is called with constant parameters. This rarely makes a real difference, but in a few cases, this can be huge.


Regarding optimizations:
It is important to know and consider the relative cost of constructs of your language. Big O notation is onl half of the story - how does your application scale. The other half is the constant factor in front of it.

As a rule of thumb, I wouldn't go out of my way to avoid virtual functions, unless there are clear and specific indications that it is a bottle neck. A clean design always comes first - but it is only one stakeholder that should not unduly hurt others.


Contrived Example: An empty virtual destructor on an array of one million small elements may plow through at least 4MB of data, thrashing your cache. If that destructor can be inlined away, the data won't be touched.

When writing library code, such considerations are far from premature. You never know how many loops will be put around your function.

Amatory answered 20/3, 2009 at 20:19 Comment(1)
+1 for mentioning inlining. I can imagine situation, where a non-virtual function called in a loop can be inlined and then for example the whole loop vectorized. Then the difference can be significantDestalinization
S
3

While everyone else is correct about the performance of virtual methods and such, I think the real problem is whether the team knows about the definition of the virtual keyword in C++.

Consider this code, what is the output?

#include <stdio.h>

class A
{
public:
    void Foo()
    {
        printf("A::Foo()\n");
    }
};

class B : public A
{
public:
    void Foo()
    {
        printf("B::Foo()\n");
    }
};

int main(int argc, char** argv)
{    
    A* a = new A();
    a->Foo();

    B* b = new B();
    b->Foo();

    A* a2 = new B();
    a2->Foo();

    return 0;
}

Nothing surprising here:

A::Foo()
B::Foo()
A::Foo()

As nothing is virtual. If the virtual keyword is added to the front of Foo in both A and B classes, we get this for the output:

A::Foo()
B::Foo()
B::Foo()

Pretty much what everyone expects.

Now, you mentioned that there are bugs because someone forgot to add a virtual keyword. So consider this code (where the virtual keyword is added to A, but not B class). What is the output then?

#include <stdio.h>

class A
{
public:
    virtual void Foo()
    {
        printf("A::Foo()\n");
    }
};

class B : public A
{
public:
    void Foo()
    {
        printf("B::Foo()\n");
    }
};

int main(int argc, char** argv)
{    
    A* a = new A();
    a->Foo();

    B* b = new B();
    b->Foo();

    A* a2 = new B();
    a2->Foo();

    return 0;
}

Answer: The same as if the virtual keyword is added to B? The reason is that the signature for B::Foo matches exactly as A::Foo() and because A's Foo is virtual, so is B's.

Now consider the case where B's Foo is virtual and A's is not. What is the output then? In this case, the output is

A::Foo()
B::Foo()
A::Foo()

The virtual keyword works downwards in the hierarchy, not upwards. It never makes the base class methods virtual. The first time a virtual method is encountered in the hierarchy is when the polymorphism begins. There isn't a way for later classes to make previous classes have virtual methods.

Don't forget that virtual methods mean that this class is giving future classes the ability to override/change some of its behaviors.

So if you have a rule to remove the virtual keyword, it may not have the intended effect.

The virtual keyword in C++ is a powerful concept. You should make sure each member of the team really knows this concept so that it can be used as designed.

Snoddy answered 22/3, 2009 at 17:8 Comment(2)
Hi Tommy, thanks for the tutorial. The bug we had was due to a missing "virtual" keyword in a method of the base class. BTW, I am saying make all functions virtual (not the opposite), then, when clearly it is not needed, remove the "virtual" keyword.Pernick
@MiniQuark: Tommy Hui is saying that if you make all functions virtual, a programmer may end up removing the keyword in a derived class, not realizing that it has no effect. You would need some way to ensure that removal of the virtual keyword always occurs at the base class.Decalogue
A
1

Depending on your platform, the overhead of a virtual call can be very undesirable. By declaring every function virtual you're essentially calling them all through a function pointer. At the very least this is an extra dereference, but on some PPC platforms it will use microcoded or otherwise slow instructions to accomplish this.

I'd recommend against your suggestion for this reason, but if it helps you prevent bugs then it may be worth the trade off. I can't help but think that there must be some middle ground that is worth finding, though.

Andre answered 20/3, 2009 at 19:38 Comment(0)
I
0

It will require just a couple of extra asm instruction to call virtual method.

But I don't think you worry that fun(int a, int b) has a couple of extra 'push' instructions compared to fun(). So don't worry about virtuals too, until you are in special situation and see that it really leads to problems.

P.S. If you have a virtual method, make sure you have a virtual destructor. This way you'll avoid possible problems


In response to 'xtofl' and 'Tom' comments. I did small tests with 3 functions:

  1. Virtual
  2. Normal
  3. Normal with 3 int parameters

My test was a simple iteration:

for(int it = 0; it < 100000000; it ++) {
    test.Method();
}

And here the results:

  1. 3,913 sec
  2. 3,873 sec
  3. 3,970 sec

It was compiled by VC++ in debug mode. I did only 5 tests per method and computed the mean value (so results may be pretty inaccurate)... Any way, the values are almost equal assuming 100 million calls. And the method with 3 extra push/pop was slower.

The main point is that if you don't like the analogy with the push/pop, think of extra if/else in your code? Do you think about CPU pipeline when you add extra if/else ;-) Also, you never know on what CPU the code will be running... Usual compiler can generates code more optimal for one CPU and less optimal for an other (Intel C++ Compiler)

Irremissible answered 20/3, 2009 at 19:55 Comment(5)
the extra asm might just trigger a page fault (that wouldn't be there for non-virtual functions) - I think you hugely oversimplify the problem.Illgotten
+1 to xtofl's comment. Virtual functions introduce indirection, which introduce pipeline "bubbles" and affect caching behavior.Seng
Timing anything in debug mode is meaningless. MSVC makes very slow code in debug mode, and loop overhead probably hides most of the difference. If you're aiming for high performance, yes you should think about minimizing if/else branches in the fast path. See agner.org/optimize for more about low-level x86 performance optimization. (Also some other links in the x86 tag wikiPassacaglia
@Tom: the key point here is that non-virtual functions can inline, but virtual can't (unless the compiler can devirtualize, e.g. if you used final in your override and you have a pointer to the derived type, rather than the base type). This test called the same virtual function every time, so it predicted perfectly; no pipeline bubbles other except from limited call throughput. And that indirect call may be a couple more uops. Branch prediction works well even for indirect branches, especially if they're always to the same destination.Passacaglia
This falls into the common trap of microbenchmarks: it looks fast when branch-predictors are hot and nothing else is going on. The mispredict overhead is higher for indirect call than for a direct call. (And yes, normal call instructions need prediction too. The fetch stage has to know the next address to fetch before this block is decoded, so it has to predict the next fetch block based on the current block address, rather than instruction address. As well as predict where in this block there's a branch instruction...)Passacaglia

© 2022 - 2024 — McMap. All rights reserved.