I'm interested to know if there is any viable way to contiguously store an array of polymorphic objects, such that virtual
methods on a common base can be legally called (and would dispatch to the correct overridden method in a subclass).
For example, considering the following classes:
struct B {
int common;
int getCommon() { return common; }
virtual int getVirtual() const = 0;
}
struct D1 : B {
virtual int getVirtual final const { return 5 };
}
struct D2 : B {
int d2int;
virtual int getVirtual final const { return d2int };
}
I would like to allocate a contiguous array of D1 and D2 objects, and treat them as B
objects, including calling getVirtual()
which will delegate to the appropriate method depending on the object type. Conceptually this seems possible: each object knows its type, typically via an embedded vtable pointer, so you could imagine, storing n objects in an array of n * max(sizeof(D1), sizeof(D2))
unsigned char
, and using placement new
and delete
to initialize the objects, and casting the unsigned char
pointer to B*
. I'm pretty sure a cast is not legal, however.
One could also imagine creating a union like:
union Both {
D1 d1;
D2 d2;
}
and then creating an array of Both
, and using placement new to create the objects of the appropriate type. This again doesn't seem to offer a way to actually call B::getVirtual()
safely, however. You don't know the last stored type for the elements, so how are you going to get your B*
? You need to use either &u.d1
or &u.d2
but you don't know which! There are actually special rules about "initial common subsequences" which let you do some things on unions where the elements share some common traits, but this only applies to standard layout types. Classes with virtual methods are not standard layout types.
Is there any way to proceed? Ideally, a solution would look something like a non-slicing std::vector<B>
that can actually contain polymorphic subclasses of B
. Yes, if required one might stipulate that all possible subclasses are known up front, but a better solution would only need to know the maximum likely size of any subclass (and fail at compile time if someone tries to add a "too big" object).
If it isn't possible to do with the built-in virtual
mechanism, other alternatives that offer similar functionality would also be interesting.
Background
No doubt someone will ask "why", so here's a bit of motivation:
It seems generally well-known that using virtual
functions to implement runtime polymorphism comes at a moderate overhead when actually calling virtual methods.
Not as often discussed, however, is the fact that using classes with virtual methods to implement polymorphism usually implies a totally different way of managing the memory for the underlying objects. You cannot just add objects of different types (but a common base) to a standard container: if you have subclasses D1
and D2
, both derived from base B
, a std::vector<B>
would slice any D1
or D2
objects added. Similarly for arrays of such objects.
The usual solution is to instead use containers or arrays of pointers to the base class, like std::vector<B*>
or perhaps std::vector<unique_ptr<B>>
or std::vector<shared_ptr<B>>
. At a minimum, this adds an extra indirection when accessing each element1, and in the case of the smart pointers, it breaks common container optimizations. If you are actually allocating each object via new
and delete
(including indirectly), then the time and memory cost of storing your objects just increased by a large amount.
Conceptually it seems like various subclasses of a common base can be stored consecutively (each object would consume the same amount of space: that of the largest supported object), and that a pointer to an object could be treated as a base-class pointer. In some cases, this could greatly simply and speed-up use of such polymorphic objects. Of course, in general, it's probably a terrible idea, but for the purposes of this question let's assume it has some niche application.
1 Among other things, this indirection pretty much prevents any vectorization of the same operation applied to all elements and harms locality of reference with implications both for caching and prefetching.
B
andC
objects, so you can loop over all the Bs and then all the Cs, with the Update() method inlined and auto-vectorized thanks to static typing. (And smaller objects: no vtable pointer). See @Bee's answer there and discussion on mine for some of the inspiration for this question. – Shipmatestd::visit
over a container ofstd::variant<D1,D2>
. Use that instead of a vtable, though, (with non-virtual overrides) because variant uses its own tag, not a vtable ptr. – Shipmateunsigned char
and allocate each object with placement new, and then basically assert that casting theunsigned char
pointer toB *
yields the same value as casting the pointer of the actual most-derived type (e.g.,D1
) toB *
. Most likely such a check simply generates no actual code, but then at least you can be sure that just casting the "cell" for the object toB *
works (i.e., catches multiple inheritance cases where this breaks). – Outrankstd::variant
approach. It's not terrible. It cuts out at least one level of indirection versus the "vector of pointers" approach. It basically uses the variant index value to index into a jump table - so very much like a vtable except you don't actually need to look up the vtable to use. There is some extra code to check that the index isn't-1
indicating an "empty" variant, and the functions jumped to appear to do a check that the variant has the "expected" type which seems unnecessary (you just dispatched there...). @PeterCordes – OutrankD1s *tmp = std::get_if<D1s>(&b);
and so on to explicitly try the two alternatives instead of using std::visit dispatch, with no throwing. godbolt.org/g/TjuSCn has an attempt that isn't working. (I think it always return nullptr).holds_alternative
is another possibility, but then you still have to access the object. – Shipmatereturn 0
instead ofreturn total
in the original code which optimized everything away, oops. Not too bad, but still has an "extra" check for the type in theelse
case (I guess it needs it, to support the empty case). At least everything is inlined. Probably just building this yourself with a discriminated union would be better. – Outrankstd::variant
for other use-cases. The version you linked checked forD1s
in both branches, so it got optimized away because NULL-dereference is UB. So there was no loop, just5 * size()
or something. Fixed version: godbolt.org/g/GrYnsA. clang5.0 is very branchy; clang4.0 uses a cmov. gcc includes a UD2 for the null-pointer deref possibility. Interesting idea to just use theget_if<D2s>
result unconditionally so there's UB if it isn't D1s or D2s, like __builtin_unreachable(). – Shipmateget<D2s>
and the code was about the same, with an exception path instead ofud2
. Too bad gcc doesn't use the UB to go back in time and just avoid the check entirely (i.e., it can determine thatget_if
must not return null, henceget_if
must return exactly the pointer to the containedD2
object... – Outrank-fno-isolate-erroneous-paths-dereference
(godbolt.org/g/KUsmLs) to treat a null-deref as__builtin_unreachable()
instead. (documented under the positive form, enabled at-O2
and higher – Shipmate.L4
: it's still checking the type of the object, and now it's justcmov
ing either 0 (rdi
) or the object address prior to dereferencing it. So it has explicitly preserved the null access and has to do a lot of work to get there. All that code should be eliminated and the subsequentadd
should simply beadd eax, DWORD PTR [rdx+4]
. Doing it manually seems to work though, code is almost optimal. – Outrankud2
go away and hoped all those extra instructions were doing something good. :P So it doesn't assume that null-deref doesn't happen; i.e. it's not the same as__builtin_unreachable()
. You have to use__builtin_unreachable()
manually to tell gcc when something can't happen, and then you do get nice-ish branchy code with no extra checks: godbolt.org/g/JpHE4q. (But beware ofif(x) __builtin_unreachable()
with ICC: it will actually check and jcc to just after theret
in the current function, last I looked. So it slows you down and sucks.) – Shipmatevariant
:P – Shipmate