Most efficient way to get an integer type id in a family of common base types
Asked Answered
K

1

7

The problem:

I have a family of objects with a common base, and I need to be able to identify the specific concrete type via an integer value.

There are two obvious approaches to do that, however both come with unacceptable overheads in terms of memory or cpu time. Since the project deals with billions of objects, the tiniest of overhead ends up being heavily pronounced, and I have tested this, it is not a case of premature optimization. The operations involved in processing the objects are all trivial, and the overhead of virtual calls diminishes performance tremendously.

  • a pure virtual int type() function implemented for every type, unfortunately that comes with the overhead of a virtual call for something as trivial as returning a static integer value

  • a int type member for every instance, specified in the constructor type, which introduces a 4 byte overhead for each of those billions of objects, wasting memory, polluting the cache and whatnot

I remember some time ago someone asking about "static virtual member variables", and naturally the answers boiled down to "no, that makes no sense", however being able to put a user variable in the vtable and having the ability to set its value for each specific type seems to be a very efficient solution to my problem.

This way both of the above-mentioned overheads are avoided, no virtual calls are necessary and there is no per-instance memory overhead either. The only overhead is the indirection to get the vtable, but considering the frequency of access of that data, it will most likely be kept into the cpu cache most of the time.

My current obvious option is to do "manual OOP" - do vtables manually in order to incorporate the necessary "meta" data into them as well, init the vtable pointer for every type and use awkward syntax to invoke pseudo "member" functions. Or even omit the use of a vtable pointer altogether, and store the id instead, and use that as an index for a table of vtables, which will be even more efficient, as it will avoid the indirection, and will shrink the size down, as I only need 2^14 distinct types.

It would be nice if I can avoid reinventing the wheel. I am not picky about the solution as long as it can give me the efficiency guarantees.

Maybe there is a way to have my type id integer in the vtable, or maybe there is another way altogether, which is highly possible since I don't keep up with the trends, and C++ got a lot of new features in the last few years.

Naturally, those ids would need to be uniform and consistent, rather than some arbitrary values of whatever the compiler cooks up internally. If that wasn't a requirement, I'd just use the vtable pointer values for an even more efficient solution that avoids indirection.

Any ideas?

Kus answered 12/2, 2018 at 16:14 Comment(16)
Yes, the actual operations that the objects are involved in are all trivial, few instructions each, and virtual calls absolutely massacre the performance. Like 90% cpu time overhead....Kus
Isn't the overhead of dynamically allocating that many objects also a problem. And you seem to have already answered your own question - you either have bad memory footprint or poor performance - although I'm a bit surprised that an optimized build will have that much overhead. In measurements I've done in the past, it's pretty tiny even when doing very simple operations.Glynis
Also, does your objects already have a vtable? If not, that typically adds 8 bytes [if you're literally having billions of objects, since you're not going to have that in a 32-bit system, even 1 billion objects is really, really hard to achieve there]Glynis
And finally, explain what you plan to use these ID's for. Debug only? Home-brewed dyn_cast? Something else?Glynis
@MatsPetersson I plan to use memory pools. Why would it be hard to achieve, vtable pointers for a billion objects will be only about 7.5 gigabytes, or merely 1.8 gb if I go for 16 bit indices.Kus
If each family is allocated into separate continous memory pools you can derive type from the memory adress. No wasted memory, and the adress is most often in the cache for some reason.Divinity
@Divinity if the scale requirements were lower that would be a very good solution. Alas, at this scale and the related memory usage it is not an option.Kus
I'm afraid you have to write vtable manually. A talk by Andrei Alexandrescu youtube.com/watch?v=vrfYLlR8X8k also uses some kind of manual vtable for better performance.Frustrated
If order doesn't matter (or little enough), you can batch your objects into homogeneous vectors at construction time and iterate over one vector at a time. See boost::poly_collection for an implementation.Alyosha
The objects form an arbitrary tree, access is arbitrary too. Also, there is very little data per object, most of it is single bit values. Which is why I am very tempted to toss out the vtable pointer altogether. For most of those 16k types, the usable data is less than 8 bytes.Kus
@Kus How does 'scale requirements' disqualify such a solution?Divinity
@Divinity because those pools would have to be huge, and their utilization is practically unknown and arbitrary, there would be a lot of memory "wasted" and unusable, and I don't have terabytes of it. Which is why I will be doing "micro pools" - unlike with processing the data, I can afford some extra cpu cycles wasted in allocating the tree efficiently.Kus
@Kus If the 'micro pool' adress spaces are in a search tree you can do a binary search to derive which pool any object is part of (poolstart <= ptr <= poollast) with logarithmic worst case complexity.Divinity
Let me remind you that the goal is to have the most efficient way to determine the type, not the least efficient ;) Wandering about thousands of pools millions of times per second just doesn't sound efficient to me.Kus
@Kus Yeah right. Prove it.Divinity
"I don't have terabytes of it." You don't need terabytes of memory. You only need terabytes of virtual address space. You surely have those.Original
K
4

If you have way more instances than you have types, then the most straightforward solution is to abstract at the level of a homogeneous container rather than a single instance.

Instead of:

{PolymorphicContainer}: Foo*, Bar*, Baz*, Foo*, Bar*, Bar*, Baz*, ...

... and having to store some type information (vtable, type field, etc) to distinguish each element while accessing memory in the most sporadic ways, you can have:

{FooContainer}: Foo, Foo, Foo, Foo, Foo, ...
{BarContainer}: Bar, Bar, Bar, Bar, Bar, ...
{BazContainer}: Baz, Baz, Baz, Baz, Baz, ...
{PolymorphicContainer}: FooContainer*, BarContainer*, BazContainer*

And you store the type information (vtable or what not) inside the containers. That does mean you need access patterns of a kind that tend to be more homogeneous, but often such an arrangement can be made in most problems I've encountered.

Gamedevs used to do things like sort polymorphic base pointers by subtype while using a custom allocator for each to store them contiguously. That combination of sorting by base pointer address and allocating each type from separate pools makes it so you then get the analogical equivalent of:

Foo*, Foo*, Foo*, Foo*, ..., Bar*, Bar*, Bar*, Bar*, ..., Baz*, Baz*, Baz*, ...

With most of them stored contiguously because they each use a custom allocator which puts all the Foos into contiguous blocks separate from all the Bars, e.g. Then on top of spatial locality you also get temporal locality on the vtables if you access things in a sequential pattern.

But that's more painful to me than abstracting at the level of the container, and doing it that way still requires the overhead of two pointers (128-bits on 64-bit machines) per object (a vptr and a base pointer to the object itself). Instead of processing orcs, goblins, humans, etc, individually through a Creature* base pointer, it makes sense to me to store them in homogeneous containers, abstract that, and process Creatures* pointers which point to entire homogeneous collections. Instead of:

class Orc: public Creature {...};

... we do:

// vptr only stored once for all orcs in the entire game.
class Orcs: public Creatures
{
public:
    // public interface consists predominantly of functions
    // which process entire ranges of orcs at once (virtual
    // dispatch only paid once possibly for a million orcs
    // rather than a million times over per orc).
    ...

private:
    struct OrcData {...};
    std::vector<OrcData> orcs;
};

Instead of:

for each creature:
     creature.do_something();

We do:

for each creatures:
     creatures.do_something();

Using this strategy, if we need a million orcs in our video game, we'd cut the costs associated with virtual dispatch, vptrs, and base pointers to 1/1,000,000th of the original cost, not to mention you get very optimal locality of reference as well free of charge.

If in some cases we need to do something to a specific creature, you might be able to store a two-part index (might be able to fit it in 32-bits or maybe 48) storing creature type index and then relative creature index in that container, though this strategy is most beneficial when you don't have to call functions just to process one creature in your critical paths. Generally you can fit this into 32-bit indices or possibly 48-bits if you then set a limit for each homogeneous container of 2^16 before it is considered "full" and you create another one for the same type, e.g. We don't have to store all the creatures of one type in one container if we want to cram our indices.

I can't say for sure if this is applicable to your case because it depends on access patterns, but it is generally the first solution I consider when you have performance issues associated with polymorphism. The first way I look at it is that you're paying the costs like virtual dispatch, loss of contiguous access patterns, loss of temporal locality on vtables, memory overhead of vptr, etc. at too granular of a level. Make the design coarser (bigger objects, like objects representing a whole collection of things, not an individual object per thing) and the costs become negligible again.

Whatever the case may be, instead of thinking about this in terms of vtables and what not, think of it in terms of how you arrange data, just bits and bytes, so that you don't have to store a pointer or integer with every single little object. Draw things out just thinking about bits and bytes, not classes and vtables and virtual functions and nice public interfaces and so forth. Think about that later after you settle on a memory representation/layout, and start off just thinking about bits and bytes, like so:

enter image description here

I find this so much easier to think about for data-oriented designs with performance-critical needs well-anticipated upfront than trying to think about language mechanisms and nice interface designs and all that. Instead I think in a C-like way first of just bits and bytes and communicate and sketch my ideas as structs and figure out where the bits and bytes should go. Then once you figure that out, you can figure out how to put a nice interface on top.

Anyway, for avoiding overhead of type information per teeny object, that means grouping them together somehow in memory and storing that analogical type field once per group instead of once per element in group. Allocating elements of a particular type in a uniform way might also give you that information based on their pointer address or index, e.g. There are many ways to about this but just think about it in terms of data stored in memory as a general strategy.

The answer is somewhat embedded in your question topic:

Most efficient way to get an integer type id in a family of common base types [...]

You store the integer ID once per family or at least once per multiple objects in that family instead of once per object. That's the only way, however you approach it, to avoid storing it once per object unless the info is already available. The alternative is to deduce it from some other information available, like you might be able to deduce it from the object's index or pointer address, at which point storing the ID would just be redundant information.

Kamal answered 14/2, 2018 at 9:11 Comment(3)
And also typeIds are not necessarily unique in certain situations afaik can be automatically determined and aliasedQueen
@Queen Ah, for RTTI? I didn't know they could be shared like that -- that's cool!Kamal
See everyday one can learn something new, on a side note I wasn't born with that knowledge.. epistemology aside.... I had to learn it to and one is glad to share information whence one sees another about to load the gun which will shoot the foot... In short the compiler may decide that type a and b and just type 1 for alignment reasons or otherwise... then u can also have the opposite and also aliasing of similar types. This is how most JavaScript interpreters work with the older type systems to provide inheritance / oopQueen

© 2022 - 2025 — McMap. All rights reserved.