Hopefully this isn't too specialized of a question for StackOverflow: if it is and could be migrated elsewhere let me know...
Many moons ago, I wrote a undergraduate thesis proposing various devirtualization techniques for C++ and related languages, generally based on the idea of precompiled specialization of code paths (somewhat like templates) but with checks to choose the correct specializations are chosen at runtime in cases where they cannot be selected at compile-time (as templates must be).
The (very) basic idea is something like the following...suppose you have a class C
like the following:
class C : public SomeInterface
{
public:
C(Foo * f) : _f(f) { }
virtual void quack()
{
_f->bark();
}
virtual void moo()
{
quack(); // a virtual call on this because quack() might be overloaded
}
// lots more virtual functions that call virtual functions on *_f or this
private:
Foo * const _f; // technically doesn't have to be const explicitly
// as long as it can be proven not be modified
};
And you knew that there exist concrete subclasses of Foo
like FooA
, FooB
, etc, with known complete types (without necessarily having an exhaustive list), then you could precompile specialized versions of C
for some selected subclasses of Foo
, like, for example (note the constructor is not included here, purposely, since it won't be called):
class C_FooA final : public SomeInterface
{
public:
virtual void quack() final
{
_f->FooA::bark(); // non-polymorphic, statically bound
}
virtual void moo() final
{
C_FooA::quack(); // also static, because C_FooA is final
// _f->FooA::bark(); // or you could even do this instead
}
// more virtual functions all specialized for FooA (*_f) and C_FooA (this)
private:
FooA * const _f;
};
And replace the constructor of C
with something like the following:
C::C(Foo * f) : _f(f)
{
if(f->vptr == vtable_of_FooA) // obviously not Standard C++
this->vptr = vtable_of_C_FooA;
else if(f->vptr == vtable_of_FooB)
this->vptr = vtable_of_C_FooB;
// otherwise leave vptr unchanged for all other values of f->vptr
}
So basically, the dynamic type of the object being constructed is changed based on the dynamic type of the arguments to its constructor. (Note, you can't do this with templates because you can only create a C<Foo>
if you know the type of f
at compile-time). From now on, any call to FooA::bark()
through C::quack()
only involves one virtual call: either the call to C::quack()
is statically bound to the non-specialized version which dynamically calls FooA::bark()
, or the call to C::quack()
is dynamically forwarded to C_FooA::quack()
which statically calls FooA::bark()
. Furthermore, dynamic dispatch might be eliminated completely in some cases if the flow analyzer has enough information to make a static call to C_FooA::quack()
, which could be very useful in a tight loop if it allows inlining. (Although technically at that point you'd probably be OK even without this optimization...)
(Note that this transformation is safe, although less useful, even if _f
is non-const and protected instead of private and C
is inherited from a different translation unit...the translation unit creating the vtable for the inherited class won't know anything at all about the specializations and the constructor of the inherited class will just set the this->vptr
to its own vtable, which will not reference any specialized functions because it won't know anything about them.)
This might seem like a lot of effort to eliminate one level of indirection, but the point is that you can do it to any arbitrary nesting level (any depth of virtual calls following this pattern could be reduced to one) based only on local information within a translation unit, and do it in a way that's resilient even if new types are defined in other translation units that you don't know about...you just might add a lot of code bloat that you wouldn't have otherwise if you did it naively.
Anyway, independent of whether this kind of optimization would really have enough bang-for-the-buck be worth the effort of implementation and also worth the space overhead in the resulting executable, my question is, is there anything in Standard C++ which would prevent a compiler from performing such a transformation?
My feeling is no, since the standard doesn't specify at all how virtual dispatch is done or how pointers-to-member-functions are represented. I'm pretty sure there's nothing about the RTTI mechanism preventing C
and C_FooA
from masquerading as the same type for all purposes, even if they have different virtual tables. The only other thing I could think of that could possibly matter is some close reading of the ODR, but probably not.
Am I overlooking something? Barring ABI/linking issues, would transformations like this be possible without breaking conforming C++ programs? (Furthermore, if yes, could this be done currently with the Itanium and/or MSVC ABIs? I'm fairly sure the answer there is yes, as well, but hopefully someone can confirm.)
EDIT: Does anyone know if anything like this is implemented in any mainstream compiler/JIT for C++, Java, or C#? (See discussion and linked chat in the comments below...) I'm aware JITs do speculative static-binding/inlining of virtuals directly at call sites, but I don't know if they do anything like this (with entirely new vtables being generated and chosen based on a single type check done at the constructor, rather than at each call site).
C_FooX
, but you then statically call the rightFooX
. Or you statically callC
and it dynamically gets you the rightFooX
. Two virtual calls instead of one, and you can do this to however arbitrary nesting level as long as the type information is in the current TU, and it's resilient against additional types being added in other TUs (so you don't need whole program analysis). – Countablefoo_->foo();
will not be reliably statically dispatched...it might here, with flow analysis, since everything is in one TU, but you can create a unique_ptr<Foo> that holds a subtype of Foo. Just because it's a unique_ptr<Foo> doesn't provide any guarantee that it's non-polymorphic because there's no syntax in C++ for specifying a non-polymorphic pointer to a polymorphic type. – Countableunique_ptr<foo_base>
case there will be two virtual calls instead one one. – Countable