How expensive is RTTI?
Asked Answered
L

13

173

I understand that there is a resource hit from using RTTI, but how big is it? Everywhere I've looked just says that "RTTI is expensive," but none of them actually give any benchmarks or quantitative data reguarding memory, processor time, or speed.

So, just how expensive is RTTI? I might use it on an embedded system where I have only 4MB of RAM, so every bit counts.

Edit: As per S. Lott's answer, it would be better if I include what I'm actually doing. I am using a class to pass in data of different lengths and that can perform different actions, so it would be difficult to do this using only virtual functions. It seems that using a few dynamic_casts could remedy this problem by allowing the different derived classes to be passed through the different levels yet still allow them to act completely differently.

From my understanding, dynamic_cast uses RTTI, so I was wondering how feasable it would be to use on a limited system.

Luvenialuwana answered 23/2, 2009 at 23:46 Comment(3)
Following from your edit - very often when I find myself doing several dynamic casts I realise that using the Visitor pattern straightens things out again. Could that work for you?Subjunctive
I'll put it this way -- I just started using dynamic_cast in C++, and now, 9 out of 10 times when I "break" the program with the debugger, it breaks inside the internal dynamic-cast function. It's damn slow.Sonde
RTTI = "run-time type information", by the way.Eyrie
T
125

Regardless of compiler, you can always save on runtime if you can afford to do

if (typeid(a) == typeid(b)) {
  B* ba = static_cast<B*>(&a);
  etc;
}

instead of

B* ba = dynamic_cast<B*>(&a);
if (ba) {
  etc;
}

The former involves only one comparison of std::type_info; the latter necessarily involves traversing an inheritance tree plus comparisons.

Past that ... like everyone says, the resource usage is implementation specific.

I agree with everyone else's comments that the submitter should avoid RTTI for design reasons. However, there are good reasons to use RTTI (mainly because of boost::any). That in mind, it's useful to know its actual resource usage in common implementations.

I recently did a bunch of research into RTTI in GCC.

tl;dr: RTTI in GCC uses negligible space and typeid(a) == typeid(b) is very fast, on many platforms (Linux, BSD and maybe embedded platforms, but not mingw32). If you know you'll always be on a blessed platform, RTTI is very close to free.

Gritty details:

GCC prefers to use a particular "vendor-neutral" C++ ABI[1], and always uses this ABI for Linux and BSD targets[2]. For platforms that support this ABI and also weak linkage, typeid() returns a consistent and unique object for each type, even across dynamic linking boundaries. You can test &typeid(a) == &typeid(b), or just rely on the fact that the portable test typeid(a) == typeid(b) does actually just compare a pointer internally.

In GCC's preferred ABI, a class vtable always holds a pointer to a per-type RTTI structure, though it might not be used. So a typeid() call itself should only cost as much as any other vtable lookup (the same as calling a virtual member function), and RTTI support shouldn't use any extra space for each object.

From what I can make out, the RTTI structures used by GCC (these are all the subclasses of std::type_info) only hold a few bytes for each type, aside from the name. It isn't clear to me whether the names are present in the output code even with -fno-rtti. Either way, the change in size of the compiled binary should reflect the change in runtime memory usage.

A quick experiment (using GCC 4.4.3 on Ubuntu 10.04 64-bit) shows that -fno-rtti actually increases the binary size of a simple test program by a few hundred bytes. This happens consistently across combinations of -g and -O3. I'm not sure why the size would increase; one possibility is that GCC's STL code behaves differently without RTTI (since exceptions won't work).

[1] Known as the Itanium C++ ABI, documented at http://www.codesourcery.com/public/cxx-abi/abi.html. The names are horribly confusing: the name refers to the original development architecture, though the ABI specification works on lots of architectures including i686/x86_64. Comments in GCC's internal source and STL code refer to Itanium as the "new" ABI in contrast to the "old" one they used before. Worse, the "new"/Itanium ABI refers to all versions available through -fabi-version; the "old" ABI predated this versioning. GCC adopted the Itanium/versioned/"new" ABI in version 3.0; the "old" ABI was used in 2.95 and earlier, if I am reading their changelogs correctly.

[2] I couldn't find any resource listing std::type_info object stability by platform. For compilers I had access to, I used the following: echo "#include <typeinfo>" | gcc -E -dM -x c++ -c - | grep GXX_MERGED_TYPEINFO_NAMES. This macro controls the behavior of operator== for std::type_info in GCC's STL, as of GCC 3.0. I did find that mingw32-gcc obeys the Windows C++ ABI, where std::type_info objects aren't unique for a type across DLLs; typeid(a) == typeid(b) calls strcmp under the covers. I speculate that on single-program embedded targets like AVR, where there is no code to link against, std::type_info objects are always stable.

Tanagra answered 2/12, 2010 at 11:25 Comment(13)
Exceptions work without RTTI. (You're allowed to throw an int and there's no vtable in that :) )Savoie
@BillyONeal: Exceptions work without vtables, yes. Without RTTI they cannot work, sorry to disappoint you.Dislodge
@Deduplicator: And yet, when I turn off RTTI in my compiler, they work just fine. Sorry to disappoint you.Savoie
@Deduplicator: (Which again, makes sense, because you can throw things like int, which cannot participate in RTTI because they have no vtable)Savoie
@BillyONeal: When you disable RTTI, only parts of it are really disabled. Specifically, everything thrown and their base-classes where applicable (recursively) still get their RTTI generated.Dislodge
@Deduplicator: That doesn't make sense for int which is not a base class nor has a base class. A compiler can implement exception handling using some RTTI features, but by no means is it required to do so.Savoie
The exception-handling mechanism must be able to work with any type fullfilling some few basic requirements. You are free to suggest how to handle throwing and catching exceptions of arbitrary type across module boundaries without RTTI. Please consider that up- and down-casting is required.Dislodge
Hold on, what is meant by "you can always save on runtime if you can afford to do..."? If version one is faster than version two, what is the cost of switching? What is it that I must "afford"? (Just want to make sure I am not missing some downsides, like instability.)Deutzia
typeid(a) == typeid(b) is NOT the same as B* ba = dynamic_cast<B*>(&a). Try it on objects with multiple inheritance as a random level on the derived class tree and you will find typeid()==typeid() will not yield a positive. dynamic_cast is the only way to search the inheritance tree for real. Stop thinking about potential savings by disabling RTTI and just use it. If you are over capacity then optimise your code bloat. Try to avoid using dynamic_cast inside inner loops or any other performance critical code and you'll be fine.Percussion
@mcoder That's why the article explicitly states that the latter necessarily involves traversing an inheritance tree plus comparisons. @CoryB You can "afford" to do it when you don't need to support casting from the entire inheritance tree. For example if you want to find all items of type X in a collection, but not those that derive from X, then what you should use is the former. If you need to also find all derived instances, you'll have to use the latter.Koziara
@Percussion indeed. or consider polymorphic_downcast<>Krucik
@Percussion thanks. I also doubt about the correctness. why the up vote is still so high for such wrong answer ...Auten
@Auten there is a (maybe no so clear) premise: "you can always save on runtime if you can afford to do", there are cases where it does what you need, aka you can afford to use the first, otherwise the premise does not apply. I would agree that it is misunderstandable, but not necessarily wrongIncisor
F
57

Perhaps these figures would help.

I was doing a quick test using this:

  • GCC Clock() + XCode's Profiler.
  • 100,000,000 loop iterations.
  • 2 x 2.66 GHz Dual-Core Intel Xeon.
  • The class in question is derived from a single base class.
  • typeid().name() returns "N12fastdelegate13FastDelegate1IivEE"

5 Cases were tested:

1) dynamic_cast< FireType* >( mDelegate )
2) typeid( *iDelegate ) == typeid( *mDelegate )
3) typeid( *iDelegate ).name() == typeid( *mDelegate ).name()
4) &typeid( *iDelegate ) == &typeid( *mDelegate )
5) { 
       fastdelegate::FastDelegateBase *iDelegate;
       iDelegate = new fastdelegate::FastDelegate1< t1 >;
       typeid( *iDelegate ) == typeid( *mDelegate )
   }

5 is just my actual code, as I needed to create an object of that type before checking if it is similar to one I already have.

Without Optimisation

For which the results were (I've averaged a few runs):

1)  1,840,000 Ticks (~2  Seconds) - dynamic_cast
2)    870,000 Ticks (~1  Second)  - typeid()
3)    890,000 Ticks (~1  Second)  - typeid().name()
4)    615,000 Ticks (~1  Second)  - &typeid()
5) 14,261,000 Ticks (~23 Seconds) - typeid() with extra variable allocations.

So the conclusion would be:

  • For simple cast cases without optimisation typeid() is more than twice faster than dyncamic_cast.
  • On a modern machine the difference between the two is about 1 nanosecond (a millionth of a millisecond).

With Optimisation (-Os)

1)  1,356,000 Ticks - dynamic_cast
2)     76,000 Ticks - typeid()
3)     76,000 Ticks - typeid().name()
4)     75,000 Ticks - &typeid()
5)     75,000 Ticks - typeid() with extra variable allocations.

So the conclusion would be:

  • For simple cast cases with optimisation, typeid() is nearly x20 faster than dyncamic_cast.

Chart

enter image description here

The Code

As requested in the comments, the code is below (a bit messy, but works). 'FastDelegate.h' is available from here.

#include <iostream>
#include "FastDelegate.h"
#include "cycle.h"
#include "time.h"

// Undefine for typeid checks
#define CAST

class ZoomManager
{
public:
    template < class Observer, class t1 >
    void Subscribe( void *aObj, void (Observer::*func )( t1 a1 ) )
    {
        mDelegate = new fastdelegate::FastDelegate1< t1 >;
        
        std::cout << "Subscribe\n";
        Fire( true );
    }
    
    template< class t1 >
    void Fire( t1 a1 )
    {
        fastdelegate::FastDelegateBase *iDelegate;
        iDelegate = new fastdelegate::FastDelegate1< t1 >;
        
        int t = 0;
        ticks start = getticks();
        
        clock_t iStart, iEnd;
        
        iStart = clock();
        
        typedef fastdelegate::FastDelegate1< t1 > FireType;
        
        for ( int i = 0; i < 100000000; i++ ) {
        
#ifdef CAST
                if ( dynamic_cast< FireType* >( mDelegate ) )
#else
                // Change this line for comparisons .name() and & comparisons
                if ( typeid( *iDelegate ) == typeid( *mDelegate ) )
#endif
                {
                    t++;
                } else {
                    t--;
                }
        }
        
        iEnd = clock();
        printf("Clock ticks: %i,\n", iEnd - iStart );
        
        std::cout << typeid( *mDelegate ).name()<<"\n";
        
        ticks end = getticks();
        double e = elapsed(start, end);
        std::cout << "Elasped: " << e;
    }
    
    template< class t1, class t2 >
    void Fire( t1 a1, t2 a2 )
    {
        std::cout << "Fire\n";
    }
    
    fastdelegate::FastDelegateBase *mDelegate;
};

class Scaler
{
public:
    Scaler( ZoomManager *aZoomManager ) :
        mZoomManager( aZoomManager ) { }
    
    void Sub()
    {
        mZoomManager->Subscribe( this, &Scaler::OnSizeChanged );
    }
    
    void OnSizeChanged( int X  )
    {
        std::cout << "Yey!\n";        
    }
private:
    ZoomManager *mZoomManager;
};

int main(int argc, const char * argv[])
{
    ZoomManager *iZoomManager = new ZoomManager();
    
    Scaler iScaler( iZoomManager );
    iScaler.Sub();
        
    delete iZoomManager;

    return 0;
}
Fourflush answered 15/12, 2012 at 18:0 Comment(9)
Of course, the dynamic cast is more general -- it works if the item is more derived. E.g. class a {}; class b : public a {}; class c : public b {}; when the target is an instance of c will work fine when testing for class b with dynamic_cast, but not with the typeid solution. Still reasonable though, +1Savoie
Can you post your code on http://ideone.com/ or something similar? I want to test something.Uno
This benchmark is entirely bogus with optimizations: the typeid check is loop-invariant and is moved out of the loop. It's not interesting at all, it's a basic benchmarking no-no.Masuria
@Kuba: Then the benchmark is bogus. That's not a reason to benchmark with optimizations off; that's a reason to write better benchmarks.Savoie
yet again, this is a failure. "For simple cast cases with optimisation, typeid() is nearly x20 faster than dyncamic_cast." they DONT do the same thing. There is a reason dynamic_cast is slower.Percussion
@KubaOber : total +1. this is so classic. and it should be obvious from the look of the cycles number that this happened.Krucik
I wonder why the dynamic_cast didn't get optimized out of the loop too, though.Delimitate
@Delimitate could be due to strict aliasing of the mDelegate pointer.Indignant
@Delimitate Most likely because mDelegate is not a const pointer. Since the pointer may change and could potentially point to another subtype, the dynamic_cast must be recalculated each iteration.Sigvard
T
41

It depends on the scale of things. For the most part it's just a couple checks and a few pointer dereferences. In most implementations, at the top of every object that has virtual functions, there is a pointer to a vtable that holds a list of pointers to all the implementations of the virtual function on that class. I would guess that most implementations would use this to either store another pointer to the type_info structure for the class.

For example in pseudo-c++:

struct Base
{
    virtual ~Base() {}
};

struct Derived
{
    virtual ~Derived() {}
};


int main()
{
    Base *d = new Derived();
    const char *name = typeid(*d).name(); // C++ way

    // faked up way (this won't actually work, but gives an idea of what might be happening in some implementations).
    const vtable *vt = reinterpret_cast<vtable *>(d);
    type_info *ti = vt->typeinfo;
    const char *name = ProcessRawName(ti->name);       
}

In general the real argument against RTTI is the unmaintainability of having to modify code everywhere every time you add a new derived class. Instead of switch statements everywhere, factor those into virtual functions. This moves all the code that is different between classes into the classes themselves, so that a new derivation just needs to override all the virtual functions to become a fully functioning class. If you've ever had to hunt through a large code base for every time someone checks the type of a class and does something different, you'll quickly learn to stay away from that style of programming.

If your compiler lets you totally turn off RTTI, the final resulting code size savings can be significant though, with such a small RAM space. The compiler needs to generate a type_info structure for every single class with a virtual function. If you turn off RTTI, all these structures do not need to be included in the executable image.

Trubow answered 23/2, 2009 at 23:56 Comment(2)
+1 for actually explaining why using RTTI is considered a bad design decision, that wasn't quite clear to me before.Rimple
This answer is a low level understanding of the power of C++. "In general" and "In most implementations" used liberally means you're not thinking about how to use the languages features well. Virtual functions and re-implementing RTTI is not the answer. RTTI is the answer. Sometimes you just want to know if an object is a certain type. That's why it's there! So you lose a few KB of RAM to some type_info structs. Gee...Percussion
U
19

Well, the profiler never lies.

Since I have a pretty stable hierarchy of 18-20 types that is not changing very much, I wondered if just using a simple enum'd member would do the trick and avoid the purportedly "high" cost of RTTI. I was skeptical if RTTI was in fact more expensive than just the if statement it introduces. Boy oh boy, is it.

It turns out that RTTI is expensive, much more expensive than an equivalent if statement or a simple switch on a primitive variable in C++. So S.Lott's answer is not completely correct, there is extra cost for RTTI, and it's not due to just having an if statement in the mix. It's due to that RTTI is very expensive.

This test was done on the Apple LLVM 5.0 compiler, with stock optimizations turned on (default release mode settings).

So, I have below 2 functions, each of which figures out the concrete type of an object either via 1) RTTI or 2) a simple switch. It does so 50,000,000 times. Without further ado, I present to you the relative runtimes for 50,000,000 runs.

enter image description here

That's right, the dynamicCasts took 94% of runtime. While the regularSwitch block only took 3.3%.

Long story short: If you can afford the energy to hook-in an enum'd type as I did below, I'd probably recommend it, if you need to do RTTI and performance is paramount. It only takes setting the member once (make sure to get it via all constructors), and be sure to never write it afterward.

That said, doing this should not mess up your OOP practices.. it's only meant to be used when type information simply isn't available and you find yourself cornered into using RTTI.

#include <stdio.h>
#include <vector>
using namespace std;

enum AnimalClassTypeTag
{
  TypeAnimal=1,
  TypeCat=1<<2,TypeBigCat=1<<3,TypeDog=1<<4
} ;

struct Animal
{
  int typeTag ;// really AnimalClassTypeTag, but it will complain at the |= if
               // at the |='s if not int
  Animal() {
    typeTag=TypeAnimal; // start just base Animal.
    // subclass ctors will |= in other types
  }
  virtual ~Animal(){}//make it polymorphic too
} ;

struct Cat : public Animal
{
  Cat(){
    typeTag|=TypeCat; //bitwise OR in the type
  }
} ;

struct BigCat : public Cat
{
  BigCat(){
    typeTag|=TypeBigCat;
  }
} ;

struct Dog : public Animal
{
  Dog(){
    typeTag|=TypeDog;
  }
} ;

typedef unsigned long long ULONGLONG;

void dynamicCasts(vector<Animal*> &zoo, ULONGLONG tests)
{
  ULONGLONG animals=0,cats=0,bigcats=0,dogs=0;
  for( ULONGLONG i = 0 ; i < tests ; i++ )
  {
    for( Animal* an : zoo )
    {
      if( dynamic_cast<Dog*>( an ) )
        dogs++;
      else if( dynamic_cast<BigCat*>( an ) )
        bigcats++;
      else if( dynamic_cast<Cat*>( an ) )
        cats++;
      else //if( dynamic_cast<Animal*>( an ) )
        animals++;
    }
  }

  printf( "%lld animals, %lld cats, %lld bigcats, %lld dogs\n", animals,cats,bigcats,dogs ) ;

}

//*NOTE: I changed from switch to if/else if chain
void regularSwitch(vector<Animal*> &zoo, ULONGLONG tests)
{
  ULONGLONG animals=0,cats=0,bigcats=0,dogs=0;
  for( ULONGLONG i = 0 ; i < tests ; i++ )
  {
    for( Animal* an : zoo )
    {
      if( an->typeTag & TypeDog )
        dogs++;
      else if( an->typeTag & TypeBigCat )
        bigcats++;
      else if( an->typeTag & TypeCat )
        cats++;
      else
        animals++;
    }
  }
  printf( "%lld animals, %lld cats, %lld bigcats, %lld dogs\n", animals,cats,bigcats,dogs ) ;  

}

int main(int argc, const char * argv[])
{
  vector<Animal*> zoo ;

  zoo.push_back( new Animal ) ;
  zoo.push_back( new Cat ) ;
  zoo.push_back( new BigCat ) ;
  zoo.push_back( new Dog ) ;

  ULONGLONG tests=50000000;

  dynamicCasts( zoo, tests ) ;
  regularSwitch( zoo, tests ) ;
}
Uno answered 29/10, 2013 at 2:12 Comment(1)
This is the approach I go when avoiding RTTI. But I put the types in a virtual function getter which returns the type directly. This is essentially static program memory and doesn't take up memory for every instance.Thrashing
D
15

The standard way:

cout << (typeid(Base) == typeid(Derived)) << endl;

Standard RTTI is expensive because it relies on doing a underlying string compare and thus the speed of RTTI can vary depending on the class name length.

The reason why string compares are used is to make it work consistently across library/DLL boundaries. If you build your application statically and/or you are using certain compilers then you can probably use:

cout << (typeid(Base).name() == typeid(Derived).name()) << endl;

Which is not guaranteed to work (will never give a false positive, but may give false negatives) but can be up to 15 times faster. This relies on the implementation of typeid() to work in a certain way and all you are doing is comparing an internal char pointer. This is also sometimes equivalent to:

cout << (&typeid(Base) == &typeid(Derived)) << endl;

You can however use a hybrid safely which will be very fast if the types match, and will be worst case for unmatched types:

cout << ( typeid(Base).name() == typeid(Derived).name() || 
          typeid(Base) == typeid(Derived) ) << endl;

To understand whether you need to optimize this you need to see how much of your time you spend getting a new packet, compared to the time it takes to process the packet. In most cases a string compare will probably not be a large overhead. (depending on your class or namespace::class name length)

The safest way to optimize this is to implement your own typeid as an int (or a enum Type : int ) as part of your Base class and use that to determine the type of the class, and then just use static_cast<> or reinterpret_cast<>

For me the difference is roughly 15 times on unoptimized MS VS 2005 C++ SP1.

Duvall answered 23/9, 2009 at 21:19 Comment(1)
"Standard RTTI is expensive because it relies on doing a underlying string compare" - no, there's nothing "Standard" about this; it's just how your implementation's typeid::operators work. GCC on a supported platform, for instance, already uses comparisons of char *s, without us forcing it - gcc.gnu.org/onlinedocs/gcc-4.6.3/libstdc++/api/… . Sure, your way makes MSVC behave a lot better than default on your platform, so kudos, and I don't know what the "some targets" that use pointers natively are... but my point is MSVC's behaviour is not in any way "Standard".Overhear
L
9

For a simple check, RTTI can be as cheap as a pointer comparison. For inheritance checking, it can be as expensive as a strcmp for every type in an inheritance tree if you are dynamic_cast-ing from the top to the bottom in one implementation out there.

You can also reduce the overhead by not using dynamic_cast and instead checking the type explicitly via &typeid(...)==&typeid(type). While that doesn't necessarily work for .dlls or other dynamically loaded code, it can be quite fast for things that are statically linked.

Although at that point it's like using a switch statement, so there you go.

Loftin answered 24/2, 2009 at 0:0 Comment(6)
Do you have any referencs for the strcmp version? It seems extremely inefficient and inaccurate to use strcmp for a type check.Olsen
In a poor implementation that could have multiple type_info objects per type, it could implement bool type_info::operator==(const type_info &x) const as "!strcmp(name(), x.name())"Locality
Step into the disassembly of either dynamic_cast or typeid().operator== for MSVC and you'll hit a strcmp in there. I assume its there for the horrible case where you are comparing against a type compiled in another .dll. And it uses the mangled name, so at least it's correct given the same compiler.Loftin
you are supposed to do "typeid(...)==typeid(type)" and not compare the addressSiesta
My point is that you can do &typeid(...)==&typeid(blah) as an early out and will be safe. It may not actually do anything useful since typeid(...) could be generated on the stack, but if their addresses are equal, then their types are equal.Loftin
Why can't you use typeid in ddls?Granniah
C
6

It's always best to measure things. In the following code, under g++, the use of hand coded type identification seems to be about three times faster than RTTI. I'm sure that a more realistic hand coded implementtaion using strings instead of chars would be slower, bringing the timings close together..

#include <iostream>
using namespace std;

struct Base {
    virtual ~Base() {}
    virtual char Type() const = 0;
};

struct A : public Base {
    char Type() const {
        return 'A';
    }
};

struct B : public Base {;
    char Type() const {
        return 'B';
    }
};

int main() {
    Base * bp = new A;
    int n = 0;
    for ( int i = 0; i < 10000000; i++ ) {
#ifdef RTTI
        if ( A * a = dynamic_cast <A*> ( bp ) ) {
            n++;
        }
#else
        if ( bp->Type() == 'A' ) {
            A * a = static_cast <A*>(bp);
            n++;
        }
#endif
    }
    cout << n << endl;
}
Conterminous answered 24/2, 2009 at 15:11 Comment(7)
try not to do it with dynamic_cast, but with typeid. it could speed up performance.Siesta
but using dynamic_cast is more realistic, at least looking at my codeConterminous
it does a different thing: it checks also whether bp points to a type derived from A. your == 'A' checks whether it points exactly to an 'A'. i also think the test is unfair somewhat: the compiler can easily see bp cannot point to anything different than A. but i think it doesn't optimize here.Siesta
anyway, i've tested your code. and it gives me "0.016s" for RTTI and "0.044s" for the virtual function calls. (using -O2)Siesta
though changing it to use typeid does not make any difference here (still 0.016s)Siesta
How about memory consumption?Despite
@cristian in either case the amounts use used for type info would be tinyConterminous
B
4

A while ago I measured the time costs for RTTI in the specific cases of MSVC and GCC for a 3ghz PowerPC. In the tests I ran (a fairly large C++ app with a deep class tree), each dynamic_cast<> cost between 0.8μs and 2μs, depending on whether it hit or missed.

Boodle answered 2/12, 2010 at 11:32 Comment(0)
O
2

So, just how expensive is RTTI?

That depends entirely on the compiler you're using. I understand that some use string comparisons, and others use real algorithms.

Your only hope is to write a sample program and see what your compiler does (or at least determine how much time it takes to execute a million dynamic_casts or a million typeids).

Ocasio answered 24/2, 2009 at 1:12 Comment(0)
P
2

RTTI can be cheap and doesn't necessarly need a strcmp. The compiler limits the test to perform the actual hierarchy, in reverse order. So if you have a class C that is a child of class B which is a child of class A, dynamic_cast from a A* ptr to a C* ptr imply only one pointer comparison and not two (BTW, only the vptr table pointer is compared). The test is like "if (vptr_of_obj == vptr_of_C) return (C*)obj"

Another example, if we try to dynamic_cast from A* to B*. In that case, the compiler will check both case (obj being a C, and obj being a B) in turns. This can also be simplified to a single test (most of times), as virtual function table is a made as an aggregation, so the test resume to "if (offset_of(vptr_of_obj, B) == vptr_of_B)" with

offset_of = return sizeof(vptr_table) >= sizeof(vptr_of_B) ? vptr_of_new_methods_in_B : 0

The memory layout of

vptr_of_C = [ vptr_of_A | vptr_of_new_methods_in_B | vptr_of_new_methods_in_C ]

How does the compiler know of to optimize this at compile time ?

At compile time, the compiler knows the current hierarchy of objects, so it refuse to compile different type hierarchy dynamic_casting. Then it just has to handle the hierarchy depth, and add the invert amount of tests to match such depth.

For example, this doesn't compile:

void * something = [...]; 
// Compile time error: Can't convert from something to MyClass, no hierarchy relation
MyClass * c = dynamic_cast<MyClass*>(something);  
Puca answered 26/8, 2010 at 13:27 Comment(0)
F
0

I have recently been examining the effects of disabling RTTI on a program that I maintain at work. Disabling RTTI dropped the size of the executable about 2%.

The other cost is that dynamic_cast() and similar approaches tend to encourage memory allocation strategies that do not have the best performance. Take the animal example provided by bobobobo. In this case we have a vector of pointers to dynamically allocated memory. Accessing each object involves an indirection and if the memory location of the objects is not predictable this can have sub-optimal performance. If we know the complete set of types, we can use std::variant or similar to avoid the indirection and a better memory layout. In the below experiment on quickbench I get that using std::variant is 13x faster than using dynamic_cast() and 2x faster than using the type tag and switch. If you do not need to preserve the order of the objects, you can use a tuple of vectors to get even better performance. There is a helpful talk on this Using Modern C++ to Eliminate Virtual Functions - Jonathan Gopel - CppCon 2022

https://quick-bench.com/q/RJca23JT9mFjGMXdfOfURd-tgXA

#include <benchmark/benchmark.h>

#include <random>
#include <variant>
#include <vector>

struct counts {
    std::size_t animals = 0;
    std::size_t cats = 0;
    std::size_t bigcats = 0;
    std::size_t dogs = 0;
};

namespace rt {
using namespace std;

enum AnimalClassTypeTag {
    TypeAnimal = 1,
    TypeCat = 1 << 2,
    TypeBigCat = 1 << 3,
    TypeDog = 1 << 4
};

struct Animal {
    int typeTag;  // really AnimalClassTypeTag, but it will complain at the |=
                  // if at the |='s if not int
    Animal() {
        typeTag = TypeAnimal;  // start just base Animal.
                               // subclass ctors will |= in other types
    }
    virtual ~Animal() {}  // make it polymorphic too
};

struct Cat : public Animal {
    Cat() {
        typeTag |= TypeCat;  // bitwise OR in the type
    }
};

struct BigCat : public Cat {
    BigCat() { typeTag |= TypeBigCat; }
};

struct Dog : public Animal {
    Dog() { typeTag |= TypeDog; }
};

counts dynamicCasts(vector<Animal*>& zoo) {
    counts r;
    for (Animal* an : zoo) {
        if (dynamic_cast<Dog*>(an))
            r.dogs++;
        else if (dynamic_cast<BigCat*>(an))
            r.bigcats++;
        else if (dynamic_cast<Cat*>(an))
            r.cats++;
        else  // if( dynamic_cast<Animal*>( an ) )
            r.animals++;
    }
    return r;
}

counts regularSwitch(vector<Animal*>& zoo) {
    counts r;
    for (Animal* an : zoo) {
        if (an->typeTag & TypeDog)
            r.dogs++;
        else if (an->typeTag & TypeBigCat)
            r.bigcats++;
        else if (an->typeTag & TypeCat)
            r.cats++;
        else
            r.animals++;
    }
    return r;
}

std::vector<Animal*> construct(std::size_t n) {
    std::vector<Animal*> zoo;
    for (std::size_t i = 0; i < n; ++i) {
        zoo.push_back(new Dog());
        zoo.push_back(new BigCat());
        zoo.push_back(new Cat());
        zoo.push_back(new Animal());
    }

    std::mt19937 g;
    std::shuffle(zoo.begin(), zoo.end(), g);

    return zoo;
}

}  // namespace rt

namespace generic {
struct Animal {};
struct Cat : Animal {};
struct BigCat : Cat {};
struct Dog : Animal {};

using var = std::variant<Cat, BigCat, Dog, Animal>;

counts use_visit(std::vector<var> const& zoo) {
    counts r;
    for (auto const& v : zoo) {
        std::visit(
            [&r]<class T>(T const& an) {
                if constexpr (std::is_base_of_v<Dog, T>) {
                    r.dogs++;
                } else if constexpr (std::is_base_of_v<BigCat, T>) {
                    r.bigcats++;
                } else if constexpr (std::is_base_of_v<Cat, T>) {
                    r.cats++;
                } else {
                    r.animals++;
                }
            },
            v);
    }
    return r;
}

std::vector<var> construct(std::size_t n) {
    std::vector<var> zoo;
    for (std::size_t i = 0; i < n; ++i) {
        zoo.emplace_back(Dog());
        zoo.emplace_back(BigCat());
        zoo.emplace_back(Cat());
        zoo.emplace_back(Animal());
    }

    std::mt19937 g;
    std::shuffle(zoo.begin(), zoo.end(), g);

    return zoo;
}

}  // namespace generic

int n = 50000;

static void use_dynamic_cast(benchmark::State& state) {
    using namespace rt;
    auto zoo = construct(n);
    // Code inside this loop is measured repeatedly
    for (auto _ : state) {
        auto counts = dynamicCasts(zoo);
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(counts);
    }
}

static void use_switch(benchmark::State& state) {
    using namespace rt;
    auto zoo = construct(n);
    // Code inside this loop is measured repeatedly
    for (auto _ : state) {
        auto counts = regularSwitch(zoo);
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(counts);
    }
}

static void use_variant(benchmark::State& state) {
    using namespace generic;
    auto zoo = construct(n);
    // Code inside this loop is measured repeatedly
    for (auto _ : state) {
        auto counts = use_visit(zoo);
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(counts);
    }
}

BENCHMARK(use_dynamic_cast);
BENCHMARK(use_switch);
BENCHMARK(use_variant);

BENCHMARK_MAIN();
Firmin answered 1/6, 2023 at 2:17 Comment(0)
L
0

Blast from the past. There was a time when RTTI was expensive. There was a time when you deliberately set compiler options to actually turn off chunks of the exception match code so it would compile to something reasonable. I have this in my 1993 era compiler documentation. Nobody cares anymore.

Today the only part of RTTI that isn't dirt cheap is the bit that asks for printable class names; and that's only overhead if the compiler has to generate it for every single class rather than for every class that actually uses it (because you dynamically linked everything so the compiler can't prove it's not used for most of the classes).

But even then, most programmers don't have to care. Disk and RAM are still cheap unless you working on tiny little embedded CPUs with single digit MB RAM sizes and similar flash sizes.

Landgravine answered 1/6, 2023 at 2:41 Comment(0)
L
-4

RTTI can be "expensive" because you've added an if-statement every single time you do the RTTI comparison. In deeply nested iterations, this can be pricey. In something that never gets executed in a loop it's essentially free.

The choice is to use proper polymorphic design, eliminating the if-statement. In deeply nested loops, this is essential for performance. Otherwise, it doesn't matter very much.

RTTI is also expensive because it can obscure the subclass hierarchy (if there even is one). It can have the side-effect of removing the "object oriented" from "object oriented programming".

Luxury answered 23/2, 2009 at 23:56 Comment(4)
Not necessarily - I was going to use it indirectly via dynamic_cast, and keep the hierarchy in place, because I need to downcast because each subtype needs to have different (variably sized) data which must be applied differently, hence dynamic_cast.Despite
@Cristián Romo: Please update your question with these new facts. dynamic_cast is a (sometimes) necessary evil in C++. Asking about RTTI performance when you are forced to do it doesn't make a lot of sense.Luxury
@S.Lott: Updated. Sorry about the confusion.Despite
I did an experiment about this just now -- it turns out RTTI is significantly more expensive than the if statement you introduce when you check runtime type information this way.Uno

© 2022 - 2024 — McMap. All rights reserved.