Performance of dynamic_cast?
Asked Answered
R

6

55

Before reading the question:
This question is not about how useful it is to use dynamic_cast. Its just about its performance.

I've recently developed a design where dynamic_cast is used a lot.
When discussing it with co-workers almost everyone says that dynamic_cast shouldn't be used because of its bad performance (these are co-workers which have different backgrounds and in some cases do not know each other. I'm working in a huge company)

I decided to test the performance of this method instead of just believing them.

The following code was used:

ptime firstValue( microsec_clock::local_time() );

ChildObject* castedObject = dynamic_cast<ChildObject*>(parentObject);

ptime secondValue( microsec_clock::local_time() );
time_duration diff = secondValue - firstValue;
std::cout << "Cast1 lasts:\t" << diff.fractional_seconds() << " microsec" << std::endl;

The above code uses methods from boost::date_time on Linux to get usable values.
I've done 3 dynamic_cast in one execution, the code for measuring them is the same.

The results of 1 execution were the following:
Cast1 lasts: 74 microsec
Cast2 lasts: 2 microsec
Cast3 lasts: 1 microsec

The first cast always took 74-111 microsec, the following casts in the same execution took 1-3 microsec.

So finally my questions:
Is dynamic_cast really performing bad?
According to the testresults its not. Is my testcode correct?
Why do so much developers think that it is slow if it isn't?

Reimer answered 29/10, 2010 at 10:14 Comment(9)
Am I missing something? I can't see any code for cast2 or cast3.Empower
Who can say what's bad? Does your program perform well enough over all? If so, then the performance isn't bad. Is the total time in dynamic casts a big percentage of your execution time? If not, then worry about other things first. More generally, 74 microseconds is terribly slow for some applications - in my last job, I'd have received and parsed an entire update record from the stock exchange, updated the database and told client apps about it in half the time. If you're interested, then compare it to other ways to get the same behaviour.Untimely
Having lots of dynamic_casts in the code is a sure indicator of the design problems.Miru
@awoodland: You're missing something :) The code is the same, as stated in my text.Reimer
It would be helpful to see a complete "minimal working example" of what you ran so we can repeat and modify your tests.Empower
My amazing ability to read your mind and understand how you generated the times for Cast2 and Cast3 allow me to deduce that it will rain herring in Iceland tonight. Compilable code is king. PS.Most casting would be implicit (passing a child object to a function that takes a parent (pointer/reference)). PPS. What are you comparing it against?Keare
Reading this in 2018 and still have no idea what "Cast 2" and "Cast 3" refer to.Longfaced
While rereading I agree that it's not easy to understand, some comprehension skills needed. It's about the following casts. So essentially Cast2 is the same test performed a second time. Cast3 is the same performed a 3rd time.Reimer
Made a benchmark test program hereGranophyre
C
65

Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer. Try e.g. 1 million+, in order to build up a representative picture. Also, this result is meaningless unless you compare it against something, i.e. doing the equivalent but without the dynamic casting.

Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).

Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid. Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL. All of this takes up cycles.

I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...

Cabrales answered 29/10, 2010 at 10:20 Comment(5)
+1, but 10K iterations is likely not enough. Something like 100 million is better.Chestonchest
@Oliver Charlesworth "... in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL" An analogous version of the check you are mentioning is present in every method of finding the runtime type of an object so this is not an argument.Volunteer
it's fortunate that you say "probably" because the implementation of clang itself is full of dynamic_cast. And it's not wrong, it's just how one works with a generic AST full of heterogeneous node types. Not everyone uses inheritance as in the Liskov principle, yet it still makes sense.Alonsoalonzo
Hi, I know that it was 10 years ago, but can you please say me what is alternative to dynamic_cast? I'm writing interpreter just for fun, where everything is an object. How can I test for type of for example arguments without dynamic_cast? (enum won't help)Intelligence
@Intelligence take a look at std::type_infoMonaco
C
36

Performance is meaningless without comparing equivalent functionality. Most people say dynamic_cast is slow without comparing to equivalent behavior. Call them out on this. Put another way:

If 'works' isn't a requirement, I can write code that fails faster than yours.

There are various ways to implement dynamic_cast, and some are faster than others. Stroustrup published a paper about using primes to improve dynamic_cast, for example. Unfortunately it's unusual to control how your compiler implements the cast, but if performance really matters to you, then you do have control over which compiler you use.

However, not using dynamic_cast will always be faster than using it — but if you don't actually need dynamic_cast, then don't use it! If you do need dynamic lookup, then there will be some overhead, and you can then compare various strategies.

Cornelia answered 29/10, 2010 at 10:22 Comment(2)
+1. Yes, btw every living person eventually dies. Which doesn't mean it is a bad idea to be alive.Chestonchest
"if performance really matters to you, then you do have control over which compiler you use." :::cries console video game developer tears:::Mistaken
H
29

Here are a few benchmarks:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html

According to them, dynamic_cast is 5-30 times slower than reinterpret_cast, and the best alternative performs almost the same as reinterpret_cast.

I'll quote the conclusion from the first article:

  • dynamic_cast is slow for anything but casting to the base type; that particular cast is optimized out
  • the inheritance level has a big impact on dynamic_cast
  • member variable + reinterpret_cast is the fastest reliable way to
    determine type; however, that has a lot higher maintenance overhead
    when coding

Absolute numbers are on the order of 100 ns for a single cast. Values like 74 msec doesn't seem close to reality.

Hazing answered 28/9, 2011 at 6:40 Comment(2)
The value he was getting was 74 usec (microseconds), not 74 msec (milliseconds). Even so, it still doesn't sound realistic.Natalia
The comparison "slower than reinterpret_cast" doesn't make sense, as reinterpret_cast is a compile-time (with does not translate to any machine code) and dynamic_cast is a runtime feature. Compared to a zero-cost operation, everything is infinitely slower. The actual comparison was against the benchmark loop itself. And clearly, the result depends on the work done in the benchmark loop. The question now becomes: What did the benchmark loop do? Reading the source code, it repeatedly did the same cast on the same object (not very realistic IMO), called a virtual function, and added a number.Opprobrium
P
15

Your mileage may vary, to understate the situation.

The performance of dynamic_cast depends a great deal on what you are doing, and can depend on what the names of classes are (and, comparing time relative to reinterpet_cast seems odd, since in most cases that takes zero instructions for practical purposes, as does e.g. a cast from unsigned to int).

I've been looking into how it works in clang/g++. Assuming that you are dynamic_casting from a B* to a D*, where B is a (direct or indirect) base of D, and disregarding multiple-base-class complications, It seems to work by calling a library function which does something like this:

for dynamic_cast<D*>(  p  )   where p is B*

type_info const * curr_typ = &typeid( *p );
while(1) {
     if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
     if( *curr_typ == typeid(B)) return nullptr;   //failed
     curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}

So, yes, it's pretty fast when *p is actually a D; just one successful type_info compare. The worst case is when the cast fails, and there are a lot of steps from the actual type to B; in this case there are a lot of failed type comparisons.

How long does type comparison take? it does this, on clang/g++:

compare_eq( type_info const &a, type_info const & b ){
   if( a.name() == b.name()) return true; // same string object
   return strcmp( a.name(), b.name())==0; // same string
}

The strcmp is needed since it's possible to have two distinct string objects providing the type_info.name() for the same type (although I'm pretty sure this only happens when one is in a shared library, and the other is not in that library). But, in most cases, when types are actually equal, they reference the same type name string; thus most successful type comparisons are very fast.

The name() method just returns a pointer to a fixed string containing the mangled name of the class. So there's another factor: if many of the classes on the way from D to B have names starting with MyAppNameSpace::AbstractSyntaxNode<, then the failing compares are going to take longer than usual; the strcmp won't fail until it reaches a difference in the mangled type names.

And, of course, since the operation as a whole is traversing a bunch of linked data structures representing the type hierarchy, the time will depend on whether those things are fresh in the cache or not. So the same cast done repeatedly is likely to show an average time which doesn't necessarily represent the typical performance for that cast.

Piles answered 1/11, 2019 at 2:53 Comment(3)
I find this answer a great addition to this nine years old question :)Hereditament
@JoelBodenmann I think things have changed; about 10 years ago I was working on something that used dynamic casts, and they broke when some of the code went into a shared lib; I had to do various things to make sure that the typeinfos in the shared lib didn't get replicated in other code. So it looks like the strcmp compare was added to better support that. In the years since then, I have not had occasion to encounter the intersection of dynamic casts and shared libs, so I wouldn't have noticed when this change happened.Piles
Could someone elaborate on how looking up for type_info is implemented in clang/g++ ?Ileac
U
7

Sorry to say this, but your test is virtually useless for determining whether the cast is slow or not. Microsecond resolution is nowhere near good enough. We're talking about an operation that, even in the worst case scenario, shouldn't take more than, say, 100 clock ticks, or less than 50 nanoseconds on a typical PC.

There's no doubt that the dynamic cast will be slower than a static cast or a reinterpret cast, because, on the assembly level, the latter two will amount to an assignment (really fast, order of 1 clock tick), and the dynamic cast requires the code to go and inspect the object to determine its real type.

I can't say off-hand how slow it really is, that would probably vary from compiler to compiler, I'd need to see the assembly code generated for that line of code. But, like I said, 50 nanoseconds per call is the upper limit of what expect to be reasonable.

Unfolded answered 29/10, 2010 at 10:23 Comment(1)
dynamic_cast needs to access RTTI, this will take cycles.Herne
B
6

The question doesn't mention the alternative. Prior to RTTI being widely available, or simply to avoid using RTTI, the traditional method is to use a virtual method to check the type of the class, and then static_cast as appropriate. This has the disadvantage that it doesn't work for multiple inheritance, but has the advantage that it doesn't have to spend time checking a multiple inheritance hierarchy either!

In my tests:

  • dynamic_cast runs at about 14.4953 nanoseconds.
  • Checking a virtual method and static_casting runs at about twice the speed, 6.55936 nanoseconds.

This is for testing with a 1:1 ratio of valid:invalid casts, using the following code with optimisations disabled. I used Windows for performance checking.

#include <iostream>
#include <windows.h>


struct BaseClass
{
    virtual int GetClass() volatile
    { return 0; }
};

struct DerivedClass final : public BaseClass
{
    virtual int GetClass() volatile final override
    { return 1; }
};


volatile DerivedClass *ManualCast(volatile BaseClass *lp)
{
    if (lp->GetClass() == 1)
    {
        return static_cast<volatile DerivedClass *>(lp);
    }

    return nullptr;
}

LARGE_INTEGER perfFreq;
LARGE_INTEGER startTime;
LARGE_INTEGER endTime;

void PrintTime()
{
    float seconds = static_cast<float>(endTime.LowPart - startTime.LowPart) / static_cast<float>(perfFreq.LowPart);
    std::cout << "T=" << seconds << std::endl;
}

BaseClass *Make()
{
    return new BaseClass();
}

BaseClass *Make2()
{
    return new DerivedClass();
}


int main()
{
    volatile BaseClass *base = Make();
    volatile BaseClass *derived = Make2();
    int unused = 0;
    const int t = 1000000000;

    QueryPerformanceFrequency(&perfFreq);
    QueryPerformanceCounter(&startTime);

    for (int n = 0; n < t; ++n)
    {
        volatile DerivedClass *alpha = dynamic_cast<volatile DerivedClass *>(base);
        volatile DerivedClass *beta = dynamic_cast<volatile DerivedClass *>(derived);
        unused += alpha ? 1 : 0;
        unused += beta ? 1 : 0;
    }


    QueryPerformanceCounter(&endTime);
    PrintTime();
    QueryPerformanceCounter(&startTime);

    for (int n = 0; n < t; ++n)
    {
        volatile DerivedClass *alpha = ManualCast(base);
        volatile DerivedClass *beta = ManualCast(derived);
        unused += alpha ? 1 : 0;
        unused += beta ? 1 : 0;
    }

    QueryPerformanceCounter(&endTime);
    PrintTime();

    std::cout << unused;

    delete base;
    delete derived;
}
Bodega answered 2/6, 2021 at 9:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.