C++: Class specialization a valid transformation for a conforming compiler?

Asked 1/3, 2013 at 0:38 Answered 1/3, 2013 at 0:38

Solved c++compiler-construction compiler-optimization virtual-functions vtable

Hopefully this isn't too specialized of a question for StackOverflow: if it is and could be migrated elsewhere let me know...

Many moons ago, I wrote a undergraduate thesis proposing various devirtualization techniques for C++ and related languages, generally based on the idea of precompiled specialization of code paths (somewhat like templates) but with checks to choose the correct specializations are chosen at runtime in cases where they cannot be selected at compile-time (as templates must be).

The (very) basic idea is something like the following...suppose you have a class C like the following:

class C : public SomeInterface
{
public:
    C(Foo * f) : _f(f) { }

    virtual void quack()
    {
        _f->bark();
    }

    virtual void moo()
    {
        quack(); // a virtual call on this because quack() might be overloaded
    }

    // lots more virtual functions that call virtual functions on *_f or this

private:
    Foo * const _f; // technically doesn't have to be const explicitly
                    // as long as it can be proven not be modified
};

And you knew that there exist concrete subclasses of Foo like FooA, FooB, etc, with known complete types (without necessarily having an exhaustive list), then you could precompile specialized versions of C for some selected subclasses of Foo, like, for example (note the constructor is not included here, purposely, since it won't be called):

class C_FooA final : public SomeInterface
{
public:
    virtual void quack() final
    {
        _f->FooA::bark(); // non-polymorphic, statically bound
    }

    virtual void moo() final
    {
        C_FooA::quack(); // also static, because C_FooA is final
        // _f->FooA::bark(); // or you could even do this instead
    }

    // more virtual functions all specialized for FooA (*_f) and C_FooA (this)

private:
    FooA * const _f;
};

And replace the constructor of C with something like the following:

C::C(Foo * f) : _f(f)
{
    if(f->vptr == vtable_of_FooA) // obviously not Standard C++
        this->vptr = vtable_of_C_FooA; 
    else if(f->vptr == vtable_of_FooB)
        this->vptr = vtable_of_C_FooB;
    // otherwise leave vptr unchanged for all other values of f->vptr
}

So basically, the dynamic type of the object being constructed is changed based on the dynamic type of the arguments to its constructor. (Note, you can't do this with templates because you can only create a C<Foo> if you know the type of f at compile-time). From now on, any call to FooA::bark() through C::quack() only involves one virtual call: either the call to C::quack() is statically bound to the non-specialized version which dynamically calls FooA::bark(), or the call to C::quack() is dynamically forwarded to C_FooA::quack() which statically calls FooA::bark(). Furthermore, dynamic dispatch might be eliminated completely in some cases if the flow analyzer has enough information to make a static call to C_FooA::quack(), which could be very useful in a tight loop if it allows inlining. (Although technically at that point you'd probably be OK even without this optimization...)

(Note that this transformation is safe, although less useful, even if _f is non-const and protected instead of private and C is inherited from a different translation unit...the translation unit creating the vtable for the inherited class won't know anything at all about the specializations and the constructor of the inherited class will just set the this->vptr to its own vtable, which will not reference any specialized functions because it won't know anything about them.)

This might seem like a lot of effort to eliminate one level of indirection, but the point is that you can do it to any arbitrary nesting level (any depth of virtual calls following this pattern could be reduced to one) based only on local information within a translation unit, and do it in a way that's resilient even if new types are defined in other translation units that you don't know about...you just might add a lot of code bloat that you wouldn't have otherwise if you did it naively.

Anyway, independent of whether this kind of optimization would really have enough bang-for-the-buck be worth the effort of implementation and also worth the space overhead in the resulting executable, my question is, is there anything in Standard C++ which would prevent a compiler from performing such a transformation?

My feeling is no, since the standard doesn't specify at all how virtual dispatch is done or how pointers-to-member-functions are represented. I'm pretty sure there's nothing about the RTTI mechanism preventing C and C_FooA from masquerading as the same type for all purposes, even if they have different virtual tables. The only other thing I could think of that could possibly matter is some close reading of the ODR, but probably not.

Am I overlooking something? Barring ABI/linking issues, would transformations like this be possible without breaking conforming C++ programs? (Furthermore, if yes, could this be done currently with the Itanium and/or MSVC ABIs? I'm fairly sure the answer there is yes, as well, but hopefully someone can confirm.)

EDIT: Does anyone know if anything like this is implemented in any mainstream compiler/JIT for C++, Java, or C#? (See discussion and linked chat in the comments below...) I'm aware JITs do speculative static-binding/inlining of virtuals directly at call sites, but I don't know if they do anything like this (with entirely new vtables being generated and chosen based on a single type check done at the constructor, rather than at each call site).

Countable answered 1/3, 2013 at 0:38 Comment(15)

Isn't this pretty much the same thing as making C a template <typename Foo>? – Geulincx 1/3, 2013 at 0:49

Yes, but you can't choose a specialization at runtime unless you do it manually...you can only choose a specific template at compile-time, meaning you can't wrap compile-time polymorphism with run-time polymorphism without doing the type checks yourself manually, which is error prone and fragile. This method is basically having the compiler do it for you, based on runtime information. – Countable 1/3, 2013 at 0:49

Well the right foo has to be chosen manually anyway, so I don't really see the difference. Well, one can now put multiple Cs with different foos in one container - but this will probably never happen since C has virtual functions itself and one probably wants container<unique_ptr<SomeInterface>>. – Geulincx 1/3, 2013 at 0:52

What do you mean "foo has to be chosen manually"? It's done at runtime, and you don't need an exhaustive type tree to do it...you just specialize for the types you know. – Countable 1/3, 2013 at 0:54

Yes, exactly, you call through SomeInterface, and you dynamically get the right C_FooX, but you then statically call the right FooX. Or you statically call C and it dynamically gets you the right FooX. Two virtual calls instead of one, and you can do this to however arbitrary nesting level as long as the type information is in the current TU, and it's resilient against additional types being added in other TUs (so you don't need whole program analysis). – Countable 1/3, 2013 at 0:56

(The big problem is that, unlike templates, you don't know which of these specializations you actually need until runtime, so if you do it blindly you'll duplicate huge amounts of code...so you need heuristics or programmer hints) – Countable 1/3, 2013 at 1:12

This ideone.com/71VWTG should produce perfect instructions. There's only one virtual call if we know the type. I don't really see the problem. :) – Geulincx 1/3, 2013 at 1:31

but you instantiated a template over foo! that's the whole point...what if foo was passed to you from elsewhere? Also, the call foo_->foo(); will not be reliably statically dispatched...it might here, with flow analysis, since everything is in one TU, but you can create a unique_ptr<Foo> that holds a subtype of Foo. Just because it's a unique_ptr<Foo> doesn't provide any guarantee that it's non-polymorphic because there's no syntax in C++ for specifying a non-polymorphic pointer to a polymorphic type. – Countable 1/3, 2013 at 1:35

It is not only statically dispatched, the function disappears completely (gets inlined), because we already know the exact type. If we do not know the type I just pass an unique_ptr<foo_base>, as demonstrated. Edit: Note that the type of the template parameter does not have to be polymorphic! It can be anything you want! – Geulincx 1/3, 2013 at 1:37

That relies on flow analysis that might not be reliable. What if you were passed a unique_ptr<Foo> from another TU? There's no way to know it actually holds a Foo rather than a subtype of Foo. And the point is, in the unique_ptr<foo_base> case there will be two virtual calls instead one one. – Countable 1/3, 2013 at 1:39

Look, I program with templates all the time :) I'm not saying they don't work when you have compile time type information. In fact, they work so well, I think the compiler should generate pseudo-templates them automatically so they can be used when you don't have compile-time type information. That's the whole point. – Countable 1/3, 2013 at 1:41

Yes, of course, because the unique_ptr<foo_base> class demonstrates the unknown case. And it is definitely reliable that the other call gets inlined because it is not virtual. That's the whole point. – Geulincx 1/3, 2013 at 1:42

let us continue this discussion in chat – Countable 1/3, 2013 at 1:42

Sorry, my mistake on the flow analysis part, didn't look at the implementation carefully enough (but the rest of my comments about getting an unknown foo are valid) – Countable 1/3, 2013 at 1:55

(Nice discussion in the chat, if anyone wants to jump in) – Countable 1/3, 2013 at 2:7

Is there anything in Standard C++ which would prevent a compiler from performing such a transformation?

Not if you're sure the observable behavior is unchanged - that's the "as-if rule" which is Standard section 1.9.

But this might make proving that your transformation is correct pretty difficult: 12.7/4:

When a virtual function is called directly or indirectly from a constructor (including the mem-initializer or brace-or-equal-initializer for a non-static data member) or from a destructor, and the object to which the call applies is the object under construction or destruction, the function called is the one defined in the constructor or destructor's own class or in one of its bases, but not a function overriding it in a class derived from the constructor or destructor's own class, or overriding it in one of the other base classes of the most derived object.

So if the destructor Foo::~Foo() happens to directly or indirectly call C::quack() on an object c, where c._f points at the object being destroyed, you need to call Foo::bark(), even if _f was a FooA when you constructed object c.

Merlinmerlina answered 1/3, 2013 at 0:38 Comment(6)

Yes, it seems you need the source of all non-pure virtual destructors of all base classes of FooA available in the TU to do this, then, and only do it if you can prove that there are no calls from there to any virtual function of C. Inconvenient, but possible to work with. Anyway, thanks, that was helpful...any other holes you can think of? – Countable 1/3, 2013 at 17:59

Darn, you also need to protect against the case that C's constructor is being called during the constructor of a subclass of FooA...I think you can work around this but it requires that you control FooA and provide some mechanism that allows C to verify that FooA is fully constructed. – Countable 1/3, 2013 at 18:22

"non-pure" is irrelevant here. A pure virtual destructor has a definition and is called just like any other, and can cause exactly the same scenario. – Merlinmerlina 1/3, 2013 at 18:23

Right, not sure what I was thinking with that. – Countable 1/3, 2013 at 18:28

Actually, the second scenario is fine too, you just need to be able to statically analyze FooA's constructor--what the derived types do is irrelevant if there's no way FooA's constructor calls C's constructor...I think I worked this out a long time ago and just forgot the details. – Countable 1/3, 2013 at 19:11

I'm going to accept this just on the basis that you've helped me uncover and plug a hole in this method, unless someone else has a more authoritative answer. – Countable 3/3, 2013 at 3:54

On first reading, this sounds like a c++-focused variation of polymorphic inline caching. I think that V8 and Oracle's JVM both use it, and I know that .NET does.

To answer your original question: I don't think there's anything in the standard that forbids these kinds of implementations. C++ takes the "as-is" rule quite seriously; so long as you faithfully implement the right semantics, you can do the implementation in any crazy way you like. c++ virtual calls aren't very complicated, so I doubt you'd trip over any edge cases there either (unlike if, say, you were trying to do something clever with static binding).

Cohe answered 1/3, 2013 at 0:38 Comment(4)

AFAIK polymorphic inline caching only is done at the call site directly, though, right, so guard checks still need to be done per-call? (This technique only does a single guard check at the constructor, if the class fulfills are certain pattern, and doesn't incur any overhead after that). – Countable 1/3, 2013 at 16:41

Yes, but in practice if you have a PIC you also have a JIT, so if you can statically prove that the target has a particular type, then you can compile out the branch too. But I'm not sure you'd bother; branches are cheap so long as they're easy to predict, so as long as the type is always same, you win. – Cohe 1/3, 2013 at 18:53

It's not always the same type though, that's the problem. It's the same type for a single given object, but the same code might be called alternatively between many different objects yielding different targets, trashing your branch prediction. This specializes the entire class for each type, so each class gets to predict the branch independently. – Countable 1/3, 2013 at 19:10

(It basically allows near-zero-overhead Pimpl, within restrictions.) – Countable 1/3, 2013 at 19:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags