Why is 'pure polymorphism' preferable over using RTTI?
Asked Answered
V

8

111

Almost every C++ resource I've seen that discusses this kind of thing tells me that I should prefer polymorphic approaches to using RTTI (run-time type identification). In general, I take this kind of advice seriously, and will try and understand the rationale -- after all, C++ is a mighty beast and hard to understand in its full depth. However, for this particular question, I'm drawing a blank and would like to see what kind of advice the internet can offer. First, let me summarize what I've learned so far, by listing the common reasons that are quoted why RTTI is "considered harmful":

Some compilers don't use it / RTTI is not always enabled

I really don't buy this argument. It's like saying I shouldn't use C++14 features, because there are compilers out there that don't support it. And yet, no one would discourage me from using C++14 features. The majority of projects will have influence over the compiler they're using, and how it's configured. Even quoting the gcc manpage:

-fno-rtti

Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features (dynamic_cast and typeid). If you don't use those parts of the language, you can save some space by using this flag. Note that exception handling uses the same information, but G++ generates it as needed. The dynamic_cast operator can still be used for casts that do not require run-time type information, i.e. casts to "void *" or to unambiguous base classes.

What this tells me is that if I'm not using RTTI, I can disable it. That's like saying, if you're not using Boost, you don't have to link to it. I don't have to plan for the case where someone is compiling with -fno-rtti. Plus, the compiler will fail loud and clear in this case.

It costs extra memory / Can be slow

Whenever I'm tempted to use RTTI, that means I need to access some kind of type information or trait of my class. If I implement a solution that does not use RTTI, this usually means I will have to add some fields to my classes to store this information, so the memory argument is kind of void (I'll give an example of this further down).

A dynamic_cast can be slow, indeed. There's usually ways to avoid having to use it speed-critical situations, though. And I don't quite see the alternative. This SO answer suggests using an enum, defined in the base class, to store the type. That only works if you know all your derived classes a-priori. That's quite a big "if"!

From that answer, it seems also that the cost of RTTI is not clear, either. Different people measure different stuff.

Elegant polymorphic designs will make RTTI unnecessary

This is the kind of advice I take seriously. In this case, I simply can't come up with good non-RTTI solutions that cover my RTTI use case. Let me provide an example:

Say I'm writing a library to handle graphs of some kind of objects. I want to allow users to generate their own types when using my library (so the enum method is not available). I have a base class for my node:

class node_base
{
  public:
    node_base();
    virtual ~node_base();

    std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();
};

Now, my nodes can be of different types. How about these:

class red_node : virtual public node_base
{
  public:
    red_node();
    virtual ~red_node();

    void get_redness();
};

class yellow_node : virtual public node_base
{
  public:
    yellow_node();
    virtual ~yellow_node();

    void set_yellowness(int);
};

Hell, why not even one of these:

class orange_node : public red_node, public yellow_node
{
  public:
    orange_node();
    virtual ~orange_node();

    void poke();
    void poke_adjacent_oranges();
};

The last function is interesting. Here's a way to write it:

void orange_node::poke_adjacent_oranges()
{
    auto adj_nodes = get_adjacent_nodes();
    foreach(auto node, adj_nodes) {
        // In this case, typeid() and static_cast might be faster
        std::shared_ptr<orange_node> o_node = dynamic_cast<orange_node>(node);
        if (o_node) {
             o_node->poke();
        }
    }
}

This all seems clear and clean. I don't have to define attributes or methods where I don't need them, the base node class can stay lean and mean. Without RTTI, where do I start? Maybe I can add a node_type attribute to the base class:

class node_base
{
  public:
    node_base();
    virtual ~node_base();

    std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();

  private:
    std::string my_type;
};

Is std::string a good idea for a type? Maybe not, but what else can I use? Make up a number and hope no one else is using it yet? Also, in the case of my orange_node, what if I want to use the methods from red_node and yellow_node? Would I have to store multiple types per node? That seems complicated.

Conclusion

This examples doesn't seem overly complex or unusual (I'm working on something similar in my day job, where the nodes represent actual hardware that gets controlled through the software, and which do very different thing depending on what they are). Yet I wouldn't know a clean way of doing this with templates or other methods. Please note that I'm trying to understand the problem, not defend my example. My reading of pages such as the SO answer I linked above and this page on Wikibooks seem to suggest I'm misusing RTTI, but I would like to learn why.

So, back to my original question: Why is 'pure polymorphism' preferable over using RTTI?

Vide answered 3/3, 2016 at 6:40 Comment(11)
What you're "missing" (as a language feature) to solve your poke oranges example would be multiple dispatch ("multimethods"). Thus, looking for ways to emulate that could be an alternative. Usually, the visitor pattern is used therefore.Collocation
Using a string as type isn't very helpful. Using an pointer to an instance of some "type" class would make this faster. But then you're basically doing manually what RTTI would be doing.Collocation
Wouldn't the use of SFINAE and eventually the CRT pattern help in avoiding the use of dynamic_cast?Seamy
@MargaretBloom No, it won't, RTTI stands for Runtime Type Information while CRTP is only for templates -- static types, so.Ingather
@Vide : all engineering processes are bound by some rules; programming is not an exception. The rules can be split into two buckets: soft rules (SHOULD) and hard rules (MUST). (There's also an advice/option bucket (COULD), so to speak.) Read how the C/C++ standard (or any other eng. standard, for the fact) defines those. I guess your problem comes from the fact you've mistaken "don't use RTTI" as a hard rule ("you MUSN'T use RTTI"). It's actually a soft rule ("you SHOULDN'T use RTTI"), meaning that you should avoid it whenever possible - and just use when you can't avoid doing soLarochelle
to reiterate: since pure polymorphism is an obvious high-level abstraction and RTTI is low-level option, your question is parallel to "why is objective/functional programming preferred over structural/low-level programming" - there ain't not real answer, because the very question is flawed. You usually don't program FPGAs/uCs with Haskell, you usually don't code your neural networks in pure assembly. The answer is we use higher abstractions because using higher abstraction level simplifies the engineering process, turning n cognitive width into log n cognitive depth.Larochelle
@vaxquis: If "you should avoid it whenever possible", as you say, then that's basically equivalent to "you should never use it". It is always possible to avoid RTTI. The key point is that you need to use human judgment. And in order to use human judgment, you need to understand why. So mbr0wn is doing absolutely the right thing: asking why pure polymorphism is considered better than RTTI, so he can make the right decisions in difficult cases. Your assumption that he must have misunderstood a "SHOULD" as a "MUST" is, therefore, both condescending and misguided.Quimper
I note lots of answers don't note the idea that your example suggests node_base is part of a library and users will make their own node types. Then they can't modify node_base to allow another solution, so maybe RTTI becomes their best option then. On the other hand, there are other ways to design such a library so that new node types can fit in much more elegantly without needing to use RTTI (and other ways to design the new node types, too).Convergence
@Quimper I don't think so; I even see a s*tstorm coming over closing/reopening this question, mostly for the reason I described. Currently, all the answers to this question here are based on reasoning, logic & experience - not on hard, verifiable facts - because there usually ain't any hard verifiable facts when it comes to design decisions. OP expects us to give him some - he won't get them, because they don't exist. The question is a "why ...?" - when it should be "when ...?" instead! BTW it's always possible to replace a computational engine with a big enough amount of monkeysLarochelle
@Quimper again, to reiterate: OP is right (and has noticed that by himself) there are cases where poly gives no real perceivable advantage over RTTI - he even says that he wants to understand - yet, AFAIK, he understands RTTI perfectly, though he doesn't understand the very need for poly abstraction itself. He says That seems complicated., and he is right - still, with enough experience, it's almost universally easier to understand complex (albeit properly & clearly constructed) abstractions, than to memorize all low-level details needed for implementation. IMO that's his real problem hereLarochelle
@MatthewWalton: I've tried to address that issue in my answer.Gulf
A
70

An interface describes what one needs to know in order to interact in a given situation in code. Once you extend the interface with "your entire type hierarchy", your interface "surface area" becomes huge, which makes reasoning about it harder.

As an example, your "poke adjacent oranges" means that I, as a 3rd party, cannot emulate being an orange! You privately declared an orange type, then use RTTI to make your code behave special when interacting with that type. If I want to "be orange", I must be within your private garden.

Now everyone who couples with "orangeness" couples with your entire orange type, and implicitly with your entire private garden, instead of with a defined interface.

While at first glance this looks like a great way to extend the limited interface without having to change all clients (adding am_I_orange), what tends to happen instead is it ossifies the code base, and prevents further extension. The special orangeness becomes inherent to the functioning of the system, and prevents you from creating a "tangerine" replacement for orange that is implemented differently and maybe removes a dependency or solves some other problem elegantly.

This does mean your interface has to be sufficient to solve your problem. From that perspective, why do you need to only poke oranges, and if so why was orangeness unavailable in the interface? If you need some fuzzy set of tags that can be added ad-hoc, you could add that to your type:

class node_base {
  public:
    bool has_tag(tag_name);

This provides a similar massive broadening of your interface from narrowly specified to broad tag-based. Except instead of doing it through RTTI and implementation details (aka, "how are you implemented? With the orange type? Ok you pass."), it does so with something easily emulated through a completely different implementation.

This can even be extended to dynamic methods, if you need that. "Do you support being Foo'd with arguments Baz, Tom and Alice? Ok, Fooing you." In a big sense, this is less intrusive than a dynamic cast to get at the fact the other object is a type you know.

Now tangerine objects can have the orange tag and play along, while being implementation-decoupled.

It can still lead to a huge mess, but it is at least a mess of messages and data, not implementation hierarchies.

Abstraction is a game of decoupling and hiding irrelevancies. It makes code easier to reason about locally. RTTI is boring a hole straight through the abstraction into implementation details. This can make solving a problem easier, but it has the cost of locking you into one specific implementation really easily.

Appellant answered 3/3, 2016 at 9:22 Comment(9)
+1 for the very last paragraph; not only because I agree with you, but because it's the hammer-on-the-nail here.Larochelle
How does one get at particular functionality once one knows that an object is tagged as supporting that functionality? Either this involves casting, or there is God class with every possible member function. The first possibility is either unchecked casting, in which case the tagging is just one's own very fallible dynamic type checking scheme, or it's checked dynamic_cast (RTTI), in which case the tags are redundant. The second possibility, a God class, is abhorrent. Summing up, this answer has many words that I think sound nice to Java programmers, but the actual content is meaningless.Slosberg
@Cheersandhth.-Alf One way would be to have a dispatch method in the interface. You will than have to methods: canDo( methodName ) and do( methodName, arguments ) and the class can implement the do Method to call the right function depending on methodName and argumentsBregma
@Falco: That is (one variant of) the first possibility that I mentioned, unchecked casting based on the tag. Here the tagging is one's own very brittle and very fallible dynamic type checking scheme. Any little client code misbehavior, and in C++ one is off in UB-land. You don't get exceptions, as you might get in Java, but Undefined Behavior, like crashes, and/or incorrect results. In addition to being extremely unreliable and dangerous it's also extremely inefficient, compared to more sane C++ code. IOW., it's very very ungood; extremely so.Slosberg
@Cheersandhth.-Alf but this is a completely safe and reliable way without any casting at all! The do Method will have a switch-case with all possible behaviours for my class or direct the call to the super-class otherwise or throw a user defined exception. This is without casting and most likely faster than dynamic typing.Bregma
Uhm. :) Arguments types?Slosberg
"RTTI is boring a hole straight through the abstraction into implementation details." Why? I would just check if object x is of type t. I don't know anything about the implementation of t. (Maybe others can add custom t's?) I like C++ but I did work with Java, too. There are (big) projects in Java which make use of RTTI in it's basic concept. I read such sources. It make the code easier to understand and improves the level of abstraction. Yes, I could use tags like written by @Falco. But why implementing it mysellf when I could write a single line and let RTTI do the work for me?Isolating
@JojOatXGME: Because "polymorphism" means being able to work with a variety of types. If you have to check if it is a particular type, beyond the already existing type check you used to get the pointer/reference to begin with, then you're looking behind polymorphism. You're not working with a variety of types; you're working with a specific type. Yes, there are "(big) projects in Java" that do this. But that's Java; the language only allows dynamic polymorphism. C++ has static polymorphism as well. Also, just because someone "big" does it doesn't make it a good idea.Gulf
@NicolBolas I agree that there are better possibilities in most scenarios. Especially with the power of templates in C++. If someone think it is a good idea to use RTTI in his scenario, maybe he should think about it twice. But I don't think that it can be generalized. I saw several libraries in c++ which are using a enum and a class-member to identify the type of the class. Mostly it was used for something similar to events. Something like this is RTTI implemented by the library itself. Why adding this unnecessary line per derived class? It is error prone, less flexible and mor effort.Isolating
D
34

The most of the moral suasion against this or that feature are typicality originated from the observation that there are a umber of misconceived uses of that feature.

Where moralists fail is that they presume ALL the usages are misconceived, while in fact features exist for a reason.

They have what I used to call the "plumber complex": they think all taps are malfunctioning because all the taps they are called to repair are. The reality is that most taps work well: you simply don't call a plumber for them!

A crazy thing that can happen is when, to avoid using a given feature, programmers write a lot of boilerplate code actually privately re-implementing exactly that feature. (Have you ever met classes that don't use RTTI nor virtual calls, but have a value to track which actual derived type are they? That's no more than RTTI reinvention in disguise.)

There is a general way to think about polymorphism: IF(selection) CALL(something) WITH(parameters). (Sorry, but programming, when disregarding abstraction, is all about that)

The use of design-time (concepts) compile-time (template-deduction based), run-time (inheritance and virtual function-based) or data-driven (RTTI and switching) polymorphism, depends on how much of the decisions are known at each of the stages of the production and how variable they are at every context.

The idea is that:

the more you can anticipate, the better the chance of catching errors and avoid bugs affecting the end-user.

If everything is constant (including the data) you can do everything with template meta-programming. After compilation occurred on actualized constants, the entire program boils down to just a return statement that spits out the result.

If there are a number of cases that are all known at compile time, but you don't know about the actual data they have to act on, then compile-time polymorphism (mainly CRTP or similar) can be a solution.

If the selection of the cases depends on the data (not compile-time known values) and the switching is mono-dimensional (what to do can be reduced to one value only) then virtual function based dispatch (or in general "function pointer tables") is needed.

If the switching is multidimensional, since no native multiple runtime dispatch exist in C++, then you have to either:

  • Reduce to one dimension by Goedelization: that's where virtual bases and multiple inheritance, with diamonds and stacked parallelograms are, but this requires the number of possible combination to be known and to be relatively small.
  • Chain the dimensions one into the other (like in the composite-visitors pattern, but this requires all classes to be aware of their other siblings, thus it cannot "scale" out from the place it has been conceived)
  • Dispatch calls based on multiple values. That's exactly what RTTI is for.

If not just the switching, but even the actions are not compile time known, then scripting & parsing is required: the data themselves must describe the action to be taken on them.

Now, since each of the cases I enumerated can be seen as a particular case of what follows it, you can solve every problem by abusing the bottom-most solution also for problems affordable with the top-most.

That's what moralization actually pushes to avoid. But that does not means that problems living in the bottom-most domains don't exist!

Bashing RTTI just to bash it, is like bashing goto just to bash it. Things for parrots, not programmers.

Downspout answered 3/3, 2016 at 8:55 Comment(6)
A good account of the levels at which each approach is applicable. I haven't heard of "Goedelization" though -- is it also known by some other name? Could you perhaps add a link or more explanation? Thanks :)Gruchot
@j_random_hacker: I too am curious about this use of Godelization. One normally thinks of Godelization as first, mapping from some string to some integer, and second, using this technique to produce self-referential statements in formal languages. I am unfamiliar with this term in the context of virtual dispatch and would love to learn more.Leonteen
In fact I'm abusing the term: according to Goedle, since every integer correspond to an integer n-ple (the powers of it's prime factors) and every n-ple correspond to an integer, every discrete n-dimensional indexing problem can be reduced to a mono-dimensional one. That does not mean that this is the one and only way of doing it: it just a way to say "it is possible". All you need is a "divide and conquer" mechanism. virtual functions are the "divide" and multiple inheritance is thte "conquer".Downspout
... When all that happens inside an finite field (a range) linear combinations are more effective (the classical i = r*C+c getting the index in an array of the cell of a matrix). In this case, the divide id the "visitor" and the conquer is the "composite". Since linear algebra is involved, the technique in this case correspond to "diagonalization"Downspout
Don't think to all al these as techniques. They are just analogiesDownspout
"A crazy thing that can happen is when, to avoid using a given feature, programmers write a lot of boilerplate code actually privately re-implementing exactly that feature. " [mashes upvote button] "Bashing RTTI just to bash it, is like bashing goto just to bash it. Things for parrots, not programmers." [continues mashing, futilely] That aside, superb summary of this pattern, its applicability to RTTI, and the alternatives. To those who've been taught to hate hammers, every nail is... um, the hammer goes... oh, you know what I mean.Customable
L
23

It looks kind of neat in a small example, but in real life you will soon end up with a long set of types that can poke each other, some of them perhaps only in one direction.

What about dark_orange_node, or black_and_orange_striped_node, or dotted_node? Can it have dots of different colors? What if most dots are orange, can it be poked then?

And each time you have to add a new rule, you will have to revisit all the poke_adjacent functions and add more if-statements.


As always, it is hard to create generic examples, I'll give you that.

But if I were to do this specific example, I would add a poke() member to all the classes and let some of them ignore the call (void poke() {}) if they are not interested.

Surely that would be even less expensive than comparing the typeids.

Liechtenstein answered 3/3, 2016 at 8:30 Comment(9)
You say "surely", but what makes you so certain? That's really what I'm trying to figure out. Let's say I rename orange_node to pokable_node, and they're the only ones I can call poke() on. That means my interface will need to implement a poke() method that, say, throws an exception ("this node is not pokable"). That seems more expensive.Vide
Why would he need to throw an exception? If you cared whether or not the interface is "poke-able" just add a function "isPokeable" and call it first before calling the poke function. Or just do what he says and "do nothing, in non-pokeable classes".Batwing
@mbr0wn: The better question is why you want pokable and nonpokable nodes to share the same base class.Gulf
@Vide - My idea was just that instead of using complicated code to decide if the other node is interested in being poked, just "send the message" and let the receiver decide if he is interested. For those totally uninterested, that would require no logic at all at the receiving end.Liechtenstein
@NicolBolas Why would you want friendly and hostile monsters to share the same base class, or focusable and non-focusable UI elements, or keyboards with a numpad and keyboards without a numpad?Syncopate
@Vide This sounds like behaviour-pattern. The base interface has two methods supportsBehaviour and invokeBehaviour and each class can have a List of behaviours. One behaviour would be Poke and could be added to the list of supported Behaviours by all classes who want to be pokeable.Bregma
@BoPersson In the case of events you can do something like this. But software might be more complex as a class could handle anything about it self. Maybe you have a object model. And you have independent algorithms which work on some objects of this model. Do you want to mess up the interface of all the objects because some of them need it? What about extensibility? Yes you could avoid it like written by Falco, but it would be less intuitive and need more code. This would make it difficult to understand and would be error prone since it can not be checked at compile time like when using RTTI.Isolating
@Joj - IMO the messed up part is to have a single base class for every object, and then try to figure out the objects' type anyway. If you have an algorithm that handles some objects, perhaps those objects should have a common base class. In the specific case of poke(), the code to figure out if the object is "pokable" is much more complicated than the code ignoring the pokes.Liechtenstein
@BoPersson Yes, in the most scenarios it will be enough to use an empty implementation. But like already written, it could be more complex. Maybe an algorthm need 5 functions and should ignore objects of types which does not support it. Of course, you could add all the functions to the base class. But since you have to modify the base to add such algorithms, you would loss extensibility. And even if you mess up only the interface of the base, the reference documentation is much more difficult to understand. And you will get no compile-time error if you implement an interface partially.Isolating
G
20

Some compilers don't use it / RTTI is not always enabled

I believe you have misunderstood such arguments.

There are a number of C++ coding places where RTTI is not to be used. Where compiler switches are used to forcibly disable RTTI. If you are coding within such a paradigm... then you almost certainly have already been informed of this restriction.

The problem therefore is with libraries. That is, if you're writing a library that depends on RTTI, then your library cannot be used by users who turn off RTTI. If you want your library to be used by those people, then it cannot use RTTI, even if your library also gets used by people who can use RTTI. Equally importantly, if you can't use RTTI, you have to shop around a little harder for libraries, since RTTI use is a deal-breaker for you.

It costs extra memory / Can be slow

There are many things you don't do in hot loops. You don't allocate memory. You don't go iterating through linked lists. And so forth. RTTI certainly can be another one of those "don't do this here" things.

However, consider all of your RTTI examples. In all cases, you have one or more objects of an indeterminate type, and you want to perform some operation on them which may not be possible for some of them.

That's something you have to work around at a design level. You can write containers that don't allocate memory which fit into the "STL" paradigm. You can avoid linked list data structures, or limit their use. You can reorganize arrays of structs into structs of arrays or whatever. It changes some things, but you can keep it compartmentalized.

Changing a complex RTTI operation into a regular virtual function call? That's a design issue. If you have to change that, then it's something that requires changes to every derived class. It changes how lots of code interacts with various classes. The scope of such a change extends far beyond the performance-critical sections of code.

So... why did you write it the wrong way to begin with?

I don't have to define attributes or methods where I don't need them, the base node class can stay lean and mean.

To what end?

You say that the base class is "lean and mean". But really... it's nonexistent. It doesn't actually do anything.

Just look at your example: node_base. What is it? It seems to be a thing which has adjacent other things. This is a Java interface (pre-generics Java at that): a class that exists solely to be something that users can cast to the real type. Maybe you add some basic feature like adjacency (Java adds ToString), but that's it.

There's a difference between "lean and mean" and "transparent".

As Yakk said, such programming styles limit themselves in interoperability, because if all of the functionality is in a derived class, then users outside of that system, with no access to that derived class, cannot interoperate with the system. They can't override virtual functions and add new behaviors. They can't even call those functions.

But what they also do is make it a major pain to actually do new stuff, even within the system. Consider your poke_adjacent_oranges function. What happens if someone wants a lime_node type which can be poked just like orange_nodes? Well, we can't derive lime_node from orange_node; that makes no sense.

Instead, we have to add a new lime_node derived from node_base. Then change the name of poke_adjacent_oranges to poke_adjacent_pokables. And then, try casting to orange_node and lime_node; whichever cast works is the one we poke.

However, lime_node needs it's own poke_adjacent_pokables. And this function needs to do the same casting checks.

And if we add a third type, we have to not only add its own function, but we must change the functions in the other two classes.

Obviously, now you make poke_adjacent_pokables a free function, so that it works for all of them. But what do you suppose happens if someone adds a fourth type and forgets to add it to that function?

Hello, silent breakage. The program appears to work more or less OK, but it isn't. Had poke been an actual virtual function, the compiler would have failed when you didn't override the pure virtual function from node_base.

With your way, you have no such compiler checks. Oh sure, the compiler won't check for non-pure virtuals, but at least you have protection in cases where protection is possible (ie: there is no default operation).

The use of transparent base classes with RTTI leads to a maintenance nightmare. Indeed, most uses of RTTI leads to maintenance headaches. That doesn't mean that RTTI isn't useful (it's vital for making boost::any work, for example). But it is a very specialized tool for very specialized needs.

In that way, it is "harmful" in the same way as goto. It's a useful tool that shouldn't be done away with. But it's use should be rare within your code.


So, if you can't use transparent base classes and dynamic casting, how do you avoid fat interfaces? How do you keep from bubbling every function you might want to call on a type from bubbling up to the base class?

The answer depends on what the base class is for.

Transparent base classes like node_base are just using the wrong tool for the problem. Linked lists are best handled by templates. The node type and adjacency would be provided by a template type. If you want to put a polymorphic type in the list, you can. Just use BaseClass* as T in the template argument. Or your preferred smart pointer.

But there are other scenarios. One is a type that does a lot of things, but has some optional parts. A particular instance might implement certain functions, while another wouldn't. However, the design of such types usually offers a proper answer.

The "entity" class is a perfect example of this. This class has long since plagued game developers. Conceptually, it has a gigantic interface, living at the intersection of nearly a dozen, entirely disparate systems. And different entities have different properties. Some entities don't have any visual representation, so their rendering functions do nothing. And this is all determined at runtime.

The modern solution for this is a component-style system. Entity is merely a container of a set of components, with some glue between them. Some components are optional; an entity that has no visual representation does not have the "graphics" component. An entity with no AI has no "controller" component. And so forth.

Entities in such a system are just pointers to components, with most of their interface being provided by accessing the components directly.

Developing such a component system requires recognizing, at the design stage, that certain functions are conceptually grouped together, such that all types that implement one will implement them all. This allows you to extract the class from the prospective base class and make it a separate component.

This also helps follow the Single Responsibility Principle. Such a componentized class only has the responsibility of being a holder of components.


From Matthew Walton:

I note lots of answers don't note the idea that your example suggests node_base is part of a library and users will make their own node types. Then they can't modify node_base to allow another solution, so maybe RTTI becomes their best option then.

OK, let's explore that.

For this to make sense, what you would have to have is a situation where some library L provides a container or other structured holder of data. The user gets to add data to this container, iterate over its contents, etc. However, the library doesn't really do anything with this data; it simply manages its existence.

But it doesn't even manage its existence so much as its destruction. The reason being that, if you're expected to use RTTI for such purposes, then you are creating classes that L is ignorant of. This means that your code allocates the object and hands it off to L for management.

Now, there are cases where something like this is a legitimate design. Event signaling/message passing, thread-safe work queues, etc. The general pattern here is this: someone is performing a service between two pieces of code that is appropriate for any type, but the service need not be aware of the specific types involved.

In C, this pattern is spelled void*, and its use requires a great deal of care to avoid being broken. In C++, this pattern is spelled std::experimental::any (soon to be spelled std::any).

The way this ought to work is that L provides a node_base class that takes an any that represents your actual data. When you receive the message, thread queue work item, or whatever you're doing, you then cast that any to its appropriate type, which both the sender and the receiver know.

So instead of deriving orange_node from node_data, you simply stick an orange inside of node_data's any member field. The end-user extracts it and uses any_cast to convert it to orange. If the cast fails, then it wasn't orange.

Now, if you're at all familiar with the implementation of any, you'll likely say, "hey wait a minute: any internally uses RTTI to make any_cast work." To which I answer, "... yes".

That's the point of an abstraction. Deep down in the details, someone is using RTTI. But at the level you ought to be operating at, direct RTTI is not something you should be doing.

You should be using types that provide you the functionality you want. After all, you don't really want RTTI. What you want is a data structure that can store a value of a given type, hide it from everyone except the desired destination, then be converted back into that type, with verification that the stored value actually is of that type.

That's called any. It uses RTTI, but using any is far superior to using RTTI directly, since it fits the desired semantics more correctly.

Gulf answered 4/3, 2016 at 3:13 Comment(0)
I
10

If you call a function, as a rule you don't really care what precise steps it will take, only that some higher-level goal will be achieved within certain constraints (and how the function makes that happen is really it's own problem).

When you use RTTI to make a preselection of special objects that can do a certain job, while others in the same set cannot, you are breaking that comfortable view of the world. All of a sudden the caller is supposed to know who can do what, instead of simply telling his minions to get on with it. Some people are bothered by this, and I suspect this is a large part of the reason why RTTI is considered a little dirty.

Is there a performance issue? Maybe, but I've never experienced it, and it might be wisdom from twenty years ago, or from people who honestly believe that using three assembly instructions instead of two is unacceptable bloat.

So how to deal with it... Depending on your situation it might make sense to have any node-specific properties bundled into separate objects (i.e. the entire 'orange' API could be a separate object). The root object could then have a virtual function to return the 'orange' API, returning nullptr by default for non-orange objects.

While this might be overkill depending on your situation, it would allow you to query on root level whether a specific node supports a specific API, and if it does, execute functions specific to that API.

Infringe answered 3/3, 2016 at 7:19 Comment(7)
Re: the performance cost - I measured dynamic_cast<> as costing about 2µs in our app on a 3GHz processor, which is about 1000x slower than checking an enum. (Our app has an 11.1ms main loop deadline, so we care about microseconds a lot.)Murine
Performance differs a lot between implementations. GCC uses a typeinfo pointer comparison which is fast. MSVC uses string comparisons which are not fast. However, MSVC's method will work with code linked to different versions of libraries, static or DLL, where GCC's pointer method believes a class in a static library is different from a class in a shared library.Rockett
@Murine Just to have a complete record here: which compiler (and which version) was that?Infringe
@Murine seconding the request for info on which compiler produced your observed results; thanks.Customable
@Customable : MSVC.Murine
@Murine Ta. Funnily enough, I think I just found your post about this - https://mcmap.net/q/142714/-how-expensive-is-rtti - indicating that GCC was roughly the same in those tests, it seems? Would be interested to know the versions, if you remember them.Customable
@Customable It would be whatever current version of the Cell processor edition of GCC was in 2010. I think it was based on 4.1.1.Murine
S
9

C++ is built on the idea of static type checking.

[1]RTTI, that is, dynamic_cast and type_id, is dynamic type checking.

So, essentially you're asking why static type checking is preferable to dynamic type checking. And the simple answer is, whether static type checking is preferable to dynamic type checking, depends. On a lot. But C++ is one of the programming languages that are designed around the idea of static type checking. And this means that e.g. the development process, in particular testing, is typically adapted to static type checking, and then fits that best.


Re

I wouldn't know a clean way of doing this with templates or other methods

you can do this process-heterogenous-nodes-of-a-graph with static type checking and no casting whatsoever via the visitor pattern, e.g. like this:

#include <iostream>
#include <set>
#include <initializer_list>

namespace graph {
    using std::set;

    class Red_thing;
    class Yellow_thing;
    class Orange_thing;

    struct Callback
    {
        virtual void handle( Red_thing& ) {}
        virtual void handle( Yellow_thing& ) {}
        virtual void handle( Orange_thing& ) {}
    };

    class Node
    {
    private:
        set<Node*> connected_;

    public:
        virtual void call( Callback& cb ) = 0;

        void connect_to( Node* p_other )
        {
            connected_.insert( p_other );
        }

        void call_on_connected( Callback& cb )
        {
            for( auto const p : connected_ ) { p->call( cb ); }
        }

        virtual ~Node(){}
    };

    class Red_thing
        : public virtual Node
    {
    public:
        void call( Callback& cb ) override { cb.handle( *this ); }

        auto redness() -> int { return 255; }
    };

    class Yellow_thing
        : public virtual Node
    {
    public:
        void call( Callback& cb ) override { cb.handle( *this ); }
    };

    class Orange_thing
        : public Red_thing
        , public Yellow_thing
    {
    public:
        void call( Callback& cb ) override { cb.handle( *this ); }

        void poke() { std::cout << "Poked!\n"; }

        void poke_connected_orange_things()
        {
            struct Poker: Callback
            {
                void handle( Orange_thing& obj ) override
                {
                    obj.poke();
                }
            } poker;

            call_on_connected( poker );
        }
    };
}  // namespace graph

auto main() -> int
{
    using namespace graph;

    Red_thing   r;
    Yellow_thing    y1, y2;
    Orange_thing    o1, o2, o3;

    for( Node* p : std::initializer_list<Node*>{ &y1, &y2, &r, &o2, &o3 } )
    {
        o1.connect_to( p );
    }
    o1.poke_connected_orange_things();
}

This assumes that the set of node types is known.

When it isn't, the visitor pattern (there are many variations of it) can be expressed with a few centralized casts, or, just a single one.


For a template-based approach see the Boost Graph library. Sad to say I am not familiar with it, I haven't used it. So I'm not sure exactly what it does and how, and to what degree it uses static type checking instead of RTTI, but since Boost is generally template-based with static type checking as the central idea, I think you'll find that its Graph sub-library is also based on static type checking.


[1] Run Time Type Information.

Slosberg answered 3/3, 2016 at 11:40 Comment(1)
One "funny thing" to note is that one can reduce the amount of code (changes when adding types) necessary for the visitor pattern is by using RTTI to "climb" a hierarchy. I know this as "acyclic visitor pattern".Collocation
I
3

Of course there is a scenario where polymorphism can't help: names. typeid lets you access the name of the type, although the way this name is encoded is implementation-defined. But usually this is not a problem since you can compare two typeid-s:

if ( typeid(5) == "int" )
    // may be false

if ( typeid(5) == typeid(int) )
   // always true

The same holds for hashes.

[...] RTTI is "considered harmful"

harmful is definitely overstating: RTTI has some drawbacks, but it does have advantages too.

You don't truly have to use RTTI. RTTI is a tool to solve OOP problems: should you use another paradigm, these would likely disappear. C doesn't have RTTI, but still works. C++ instead fully supports OOP and gives you multiple tools to overcome some issue that may require runtime information: one of them is indeed RTTI, which though comes with a price. If you can't afford it, thing you'd better state only after a secure performance analysis, there is still the old-school void*: it's free. Costless. But you get no type safety. So it's all about trades.


  • Some compilers don't use / RTTI is not always enabled
    I really don't buy this argument. It's like saying I shouldn't use C++14 features, because there are compilers out there that don't support it. And yet, no one would discourage me from using C++14 features.

If you write (possibly strictly) conforming C++ code, you can expect the same behavior regardless of the implementation. Standard-compliant implementations shall support standard C++ features.

But do consider that in some environments C++ defines («freestanding» ones), RTTI need not be provided and neither do exceptions, virtual and so on. RTTI needs an underlying layer to work correctly that deals with low-level details such as the ABI and the actual type information.


I agree with Yakk regarding RTTI in this case. Yes, it could be used; but is it logically correct? The fact that the language allows you to bypass this check does not mean it should be done.

Ingather answered 3/3, 2016 at 14:40 Comment(0)
M
0

If you can enumerate the set of types participating in the graph at compile time, you can replace the use of dynamic_cast() with std::variant. This should be more efficient. It also creates the possibility of making unhandled node types into a compile time error. Of course, the set of node types may only be known at run time and then you will almost certainly need some kind of RTTI. https://godbolt.org/z/TPjxa1G6M

#include <unordered_map>
#include <variant>
#include <type_traits>
#include <cstdio>

template <class... Ts>
struct graph {
    using key = std::variant<Ts const*...>;
    using value = std::variant<Ts*...>;
    std::unordered_multimap<key, value> edge;

    template <class callable>
    void visit_adjacent(key n, callable f) const {
        auto [i, e] = edge.equal_range(n);

        for (; i != e; ++i) {
            std::visit(f, i->second);
        }
    }
};



struct red {};
struct yellow {};
struct orange : red, yellow {
    void poke() { std::printf("poke %p\n", (void*)this); }

    template <class graph>
    void poke_adjacent_oranges(graph const& g) const {
        g.visit_adjacent(this, []<class T>(T* other) {
            if constexpr (std::is_base_of_v<orange, T>) {
                other->poke();
            }
        });

    }
};

// ....
struct blue {};

int
main(){
    graph<red, yellow, orange, blue> g;

    orange o1;
    orange o2;
    red r;
    blue b;

    g.edge.emplace(&o1, &o2);
    g.edge.emplace(&o1, &r);
    g.edge.emplace(&o1, &b);

    std::printf("o2: %p\n", (void*)&o2);
    o1.poke_adjacent_oranges(g);


    return 0;
}
Megavolt answered 31/5, 2023 at 7:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.