Most efficient way to get an integer type id in a family of common base types

The problem:

I have a family of objects with a common base, and I need to be able to identify the specific concrete type via an integer value.

There are two obvious approaches to do that, however both come with unacceptable overheads in terms of memory or cpu time. Since the project deals with billions of objects, the tiniest of overhead ends up being heavily pronounced, and I have tested this, it is not a case of premature optimization. The operations involved in processing the objects are all trivial, and the overhead of virtual calls diminishes performance tremendously.

a pure virtual int type() function implemented for every type, unfortunately that comes with the overhead of a virtual call for something as trivial as returning a static integer value
a int type member for every instance, specified in the constructor type, which introduces a 4 byte overhead for each of those billions of objects, wasting memory, polluting the cache and whatnot

I remember some time ago someone asking about "static virtual member variables", and naturally the answers boiled down to "no, that makes no sense", however being able to put a user variable in the vtable and having the ability to set its value for each specific type seems to be a very efficient solution to my problem.

This way both of the above-mentioned overheads are avoided, no virtual calls are necessary and there is no per-instance memory overhead either. The only overhead is the indirection to get the vtable, but considering the frequency of access of that data, it will most likely be kept into the cpu cache most of the time.

My current obvious option is to do "manual OOP" - do vtables manually in order to incorporate the necessary "meta" data into them as well, init the vtable pointer for every type and use awkward syntax to invoke pseudo "member" functions. Or even omit the use of a vtable pointer altogether, and store the id instead, and use that as an index for a table of vtables, which will be even more efficient, as it will avoid the indirection, and will shrink the size down, as I only need 2^14 distinct types.

It would be nice if I can avoid reinventing the wheel. I am not picky about the solution as long as it can give me the efficiency guarantees.

Maybe there is a way to have my type id integer in the vtable, or maybe there is another way altogether, which is highly possible since I don't keep up with the trends, and C++ got a lot of new features in the last few years.

Naturally, those ids would need to be uniform and consistent, rather than some arbitrary values of whatever the compiler cooks up internally. If that wasn't a requirement, I'd just use the vtable pointer values for an even more efficient solution that avoids indirection.

Any ideas?

If you have way more instances than you have types, then the most straightforward solution is to abstract at the level of a homogeneous container rather than a single instance.

Instead of:

{PolymorphicContainer}: Foo*, Bar*, Baz*, Foo*, Bar*, Bar*, Baz*, ...

... and having to store some type information (vtable, type field, etc) to distinguish each element while accessing memory in the most sporadic ways, you can have:

{FooContainer}: Foo, Foo, Foo, Foo, Foo, ...
{BarContainer}: Bar, Bar, Bar, Bar, Bar, ...
{BazContainer}: Baz, Baz, Baz, Baz, Baz, ...
{PolymorphicContainer}: FooContainer*, BarContainer*, BazContainer*

And you store the type information (vtable or what not) inside the containers. That does mean you need access patterns of a kind that tend to be more homogeneous, but often such an arrangement can be made in most problems I've encountered.

Gamedevs used to do things like sort polymorphic base pointers by subtype while using a custom allocator for each to store them contiguously. That combination of sorting by base pointer address and allocating each type from separate pools makes it so you then get the analogical equivalent of:

Foo*, Foo*, Foo*, Foo*, ..., Bar*, Bar*, Bar*, Bar*, ..., Baz*, Baz*, Baz*, ...

With most of them stored contiguously because they each use a custom allocator which puts all the Foos into contiguous blocks separate from all the Bars, e.g. Then on top of spatial locality you also get temporal locality on the vtables if you access things in a sequential pattern.

But that's more painful to me than abstracting at the level of the container, and doing it that way still requires the overhead of two pointers (128-bits on 64-bit machines) per object (a vptr and a base pointer to the object itself). Instead of processing orcs, goblins, humans, etc, individually through a Creature* base pointer, it makes sense to me to store them in homogeneous containers, abstract that, and process Creatures* pointers which point to entire homogeneous collections. Instead of:

class Orc: public Creature {...};

... we do:

// vptr only stored once for all orcs in the entire game.
class Orcs: public Creatures
{
public:
    // public interface consists predominantly of functions
    // which process entire ranges of orcs at once (virtual
    // dispatch only paid once possibly for a million orcs
    // rather than a million times over per orc).
    ...

private:
    struct OrcData {...};
    std::vector<OrcData> orcs;
};

Instead of:

for each creature:
     creature.do_something();

We do:

for each creatures:
     creatures.do_something();

Using this strategy, if we need a million orcs in our video game, we'd cut the costs associated with virtual dispatch, vptrs, and base pointers to 1/1,000,000th of the original cost, not to mention you get very optimal locality of reference as well free of charge.

If in some cases we need to do something to a specific creature, you might be able to store a two-part index (might be able to fit it in 32-bits or maybe 48) storing creature type index and then relative creature index in that container, though this strategy is most beneficial when you don't have to call functions just to process one creature in your critical paths. Generally you can fit this into 32-bit indices or possibly 48-bits if you then set a limit for each homogeneous container of 2^16 before it is considered "full" and you create another one for the same type, e.g. We don't have to store all the creatures of one type in one container if we want to cram our indices.

I can't say for sure if this is applicable to your case because it depends on access patterns, but it is generally the first solution I consider when you have performance issues associated with polymorphism. The first way I look at it is that you're paying the costs like virtual dispatch, loss of contiguous access patterns, loss of temporal locality on vtables, memory overhead of vptr, etc. at too granular of a level. Make the design coarser (bigger objects, like objects representing a whole collection of things, not an individual object per thing) and the costs become negligible again.

Whatever the case may be, instead of thinking about this in terms of vtables and what not, think of it in terms of how you arrange data, just bits and bytes, so that you don't have to store a pointer or integer with every single little object. Draw things out just thinking about bits and bytes, not classes and vtables and virtual functions and nice public interfaces and so forth. Think about that later after you settle on a memory representation/layout, and start off just thinking about bits and bytes, like so:

I find this so much easier to think about for data-oriented designs with performance-critical needs well-anticipated upfront than trying to think about language mechanisms and nice interface designs and all that. Instead I think in a C-like way first of just bits and bytes and communicate and sketch my ideas as structs and figure out where the bits and bytes should go. Then once you figure that out, you can figure out how to put a nice interface on top.

Anyway, for avoiding overhead of type information per teeny object, that means grouping them together somehow in memory and storing that analogical type field once per group instead of once per element in group. Allocating elements of a particular type in a uniform way might also give you that information based on their pointer address or index, e.g. There are many ways to about this but just think about it in terms of data stored in memory as a general strategy.

The answer is somewhat embedded in your question topic:

Most efficient way to get an integer type id in a family of common base types [...]

You store the integer ID once per family or at least once per multiple objects in that family instead of once per object. That's the only way, however you approach it, to avoid storing it once per object unless the info is already available. The alternative is to deduce it from some other information available, like you might be able to deduce it from the object's index or pointer address, at which point storing the ID would just be redundant information.

Recommended topics

Hot tags