If you have way more instances than you have types, then the most straightforward solution is to abstract at the level of a homogeneous container rather than a single instance.
Instead of:
{PolymorphicContainer}: Foo*, Bar*, Baz*, Foo*, Bar*, Bar*, Baz*, ...
... and having to store some type information (vtable, type
field, etc) to distinguish each element while accessing memory in the most sporadic ways, you can have:
{FooContainer}: Foo, Foo, Foo, Foo, Foo, ...
{BarContainer}: Bar, Bar, Bar, Bar, Bar, ...
{BazContainer}: Baz, Baz, Baz, Baz, Baz, ...
{PolymorphicContainer}: FooContainer*, BarContainer*, BazContainer*
And you store the type information (vtable or what not) inside the containers. That does mean you need access patterns of a kind that tend to be more homogeneous, but often such an arrangement can be made in most problems I've encountered.
Gamedevs used to do things like sort polymorphic base pointers by subtype while using a custom allocator for each to store them contiguously. That combination of sorting by base pointer address and allocating each type from separate pools makes it so you then get the analogical equivalent of:
Foo*, Foo*, Foo*, Foo*, ..., Bar*, Bar*, Bar*, Bar*, ..., Baz*, Baz*, Baz*, ...
With most of them stored contiguously because they each use a custom allocator which puts all the Foos into contiguous blocks separate from all the Bars, e.g. Then on top of spatial locality you also get temporal locality on the vtables if you access things in a sequential pattern.
But that's more painful to me than abstracting at the level of the container, and doing it that way still requires the overhead of two pointers (128-bits on 64-bit machines) per object (a vptr and a base pointer to the object itself). Instead of processing orcs, goblins, humans, etc, individually through a Creature*
base pointer, it makes sense to me to store them in homogeneous containers, abstract that, and process Creatures*
pointers which point to entire homogeneous collections. Instead of:
class Orc: public Creature {...};
... we do:
// vptr only stored once for all orcs in the entire game.
class Orcs: public Creatures
{
public:
// public interface consists predominantly of functions
// which process entire ranges of orcs at once (virtual
// dispatch only paid once possibly for a million orcs
// rather than a million times over per orc).
...
private:
struct OrcData {...};
std::vector<OrcData> orcs;
};
Instead of:
for each creature:
creature.do_something();
We do:
for each creatures:
creatures.do_something();
Using this strategy, if we need a million orcs in our video game, we'd cut the costs associated with virtual dispatch, vptrs, and base pointers to 1/1,000,000th of the original cost, not to mention you get very optimal locality of reference as well free of charge.
If in some cases we need to do something to a specific creature, you might be able to store a two-part index (might be able to fit it in 32-bits or maybe 48) storing creature type index and then relative creature index in that container, though this strategy is most beneficial when you don't have to call functions just to process one creature in your critical paths. Generally you can fit this into 32-bit indices or possibly 48-bits if you then set a limit for each homogeneous container of 2^16 before it is considered "full" and you create another one for the same type, e.g. We don't have to store all the creatures of one type in one container if we want to cram our indices.
I can't say for sure if this is applicable to your case because it depends on access patterns, but it is generally the first solution I consider when you have performance issues associated with polymorphism. The first way I look at it is that you're paying the costs like virtual dispatch, loss of contiguous access patterns, loss of temporal locality on vtables, memory overhead of vptr, etc. at too granular of a level. Make the design coarser (bigger objects, like objects representing a whole collection of things, not an individual object per thing) and the costs become negligible again.
Whatever the case may be, instead of thinking about this in terms of vtables and what not, think of it in terms of how you arrange data, just bits and bytes, so that you don't have to store a pointer or integer with every single little object. Draw things out just thinking about bits and bytes, not classes and vtables and virtual functions and nice public interfaces and so forth. Think about that later after you settle on a memory representation/layout, and start off just thinking about bits and bytes, like so:
I find this so much easier to think about for data-oriented designs with performance-critical needs well-anticipated upfront than trying to think about language mechanisms and nice interface designs and all that. Instead I think in a C-like way first of just bits and bytes and communicate and sketch my ideas as structs
and figure out where the bits and bytes should go. Then once you figure that out, you can figure out how to put a nice interface on top.
Anyway, for avoiding overhead of type information per teeny object, that means grouping them together somehow in memory and storing that analogical type field once per group instead of once per element in group. Allocating elements of a particular type in a uniform way might also give you that information based on their pointer address or index, e.g. There are many ways to about this but just think about it in terms of data stored in memory as a general strategy.
The answer is somewhat embedded in your question topic:
Most efficient way to get an integer type id in a family of common
base types [...]
You store the integer ID once per family or at least once per multiple objects in that family instead of once per object. That's the only way, however you approach it, to avoid storing it once per object unless the info is already available. The alternative is to deduce it from some other information available, like you might be able to deduce it from the object's index or pointer address, at which point storing the ID would just be redundant information.
boost::poly_collection
for an implementation. – Alyosha