Polymorphism and Dynamic Dispatch (hyper-abridged version)
Note: I was not able to fit enough information about multiple inheritance with virtual bases, as there is not much of anything simple about it, and the details would clutter the exposition (further). This answer demonstrates the mechanisms used to implement dynamic dispatch assuming only single inheritance.
Interpreting abstract types and their behaviors visible across module boundaries requires a common Application Binary Interface (ABI). The C++ standard, of course, does not require the implementation of any particular ABI.
An ABI would describe:
- The layout of virtual method dispatch tables (vtables)
- The metadata required for runtime type checks and cast operations
- Name decoration (a.k.a. mangling), calling conventions, and many other things.
Both modules in the following example, external.so
and main.o
, are assumed to have been linked to the same runtime. Static and dynamic binding give preference to symbols located within the calling module.
An external library
external.h (distributed to users):
class Base
{
__vfptr_t __vfptr; // For exposition
public:
__attribute__((dllimport)) virtual int Helpful();
__attribute__((dllimport)) virtual ~Base();
};
class Derived : public Base
{
public:
__attribute__((dllimport)) virtual int Helpful() override;
~Derived()
{
// Visible destructor logic here.
// Note: This is in the header!
// __vft@Base gets treated like any other imported symbol:
// The address is resolved at load time.
//
this->__vfptr = &__vft@Base;
static_cast<Base *>(this)->~Base();
}
};
__attribute__((dllimport)) Derived *ReticulateSplines();
external.cpp:
#include "external.h" // the version in which the attributes are dllexport
__attribute__((dllexport)) int Base::Helpful()
{
return 47;
}
__attribute__((dllexport)) Base::~Base()
{
}
__attribute__((dllexport)) int Derived::Helpful()
{
return 4449;
}
__attribute__((dllexport)) Derived *ReticulateSplines()
{
return new Derived(); // __vfptr = &__vft@Derived in external.so
}
external.so (not a real binary layout):
__vft@Base:
[offset to __type_info@Base] <-- in external.so
[offset to Base::~Base] <------- in external.so
[offset to Base::Helpful] <----- in external.so
__vft@Derived:
[offset to __type_info@Derived] <-- in external.so
[offset to Derived::~Derived] <---- in external.so
[offset to Derived::Helpful] <----- in external.so
Etc...
__type_info@Base:
[null base offset field]
[offset to mangled name]
__type_info@Derived:
[offset to __type_info@Base]
[offset to mangled name]
Etc...
An application using the external library
special.hpp:
#include <iostream>
#include "external.h"
class Special : public Base
{
public:
int Helpful() override
{
return 55;
}
virtual void NotHelpful()
{
throw std::exception{"derp"};
}
};
class MoreDerived : public Derived
{
public:
int Helpful() override
{
return 21;
}
~MoreDerived()
{
// Visible destructor logic here
this->__vfptr = &__vft@Derived; // <- the version in main.o
static_cast<Derived *>(this)->~Derived();
}
};
class Related : public Base
{
public:
virtual void AlsoHelpful() = 0;
};
class RelatedImpl : public Related
{
public:
void AlsoHelpful() override
{
using namespace std;
cout << "The time for action... Is now!" << endl;
}
};
main.cpp:
#include "special.hpp"
int main(int argc, char **argv)
{
Base *ptr = new Base(); // ptr->__vfptr = &__vft@Base (in external.so)
auto r = ptr->Helpful(); // calls "Base::Helpful" in external.so
// r = 47
delete ptr; // calls "Base::~Base" in external.so
ptr = new Derived(); // ptr->__vfptr = &__vft@Derived (in main.o)
r = ptr->Helpful(); // calls "Derived::Helpful" in external.so
// r = 4449
delete ptr; // calls "Derived::~Derived" in main.o
ptr = ReticulateSplines(); // ptr->__vfptr = &__vft@Derived (in external.so)
r = ptr->Helpful(); // calls "Derived::Helpful" in external.so
// r = 4449
delete ptr; // calls "Derived::~Derived" in external.so
ptr = new Special(); // ptr->__vfptr = &__vft@Special (in main.o)
r = ptr->Helpful(); // calls "Special::Helpful" in main.o
// r = 55
delete ptr; // calls "Base::~Base" in external.so
ptr = new MoreDerived(); // ptr->__vfptr = & __vft@MoreDerived (in main.o)
r = ptr->Helpful(); // calls "MoreDerived::Helpful" in main.o
// r = 21
delete ptr; // calls "MoreDerived::~MoreDerived" in main.o
return 0;
}
main.o:
__vft@Derived:
[offset to __type_info@Derivd] <-- in main.o
[offset to Derived::~Derived] <--- in main.o
[offset to Derived::Helpful] <---- stub that jumps to import table
__vft@Special:
[offset to __type_info@Special] <-- in main.o
[offset to Base::~Base] <---------- stub that jumps to import table
[offset to Special::Helpful] <----- in main.o
[offset to Special::NotHelpful] <-- in main.o
__vft@MoreDerived:
[offset to __type_info@MoreDerived] <---- in main.o
[offset to MoreDerived::~MoreDerived] <-- in main.o
[offset to MoreDerived::Helpful] <------- in main.o
__vft@Related:
[offset to __type_info@Related] <------ in main.o
[offset to Base::~Base] <-------------- stub that jumps to import table
[offset to Base::Helpful] <------------ stub that jumps to import table
[offset to Related::AlsoHelpful] <----- stub that throws PV exception
__vft@RelatedImpl:
[offset to __type_info@RelatedImpl] <--- in main.o
[offset to Base::~Base] <--------------- stub that jumps to import table
[offset to Base::Helpful] <------------- stub that jumps to import table
[offset to RelatedImpl::AlsoHelpful] <-- in main.o
Etc...
__type_info@Base:
[null base offset field]
[offset to mangled name]
__type_info@Derived:
[offset to __type_info@Base]
[offset to mangled name]
__type_info@Special:
[offset to __type_info@Base]
[offset to mangled name]
__type_info@MoreDerived:
[offset to __type_info@Derived]
[offset to mangled name]
__type_info@Related:
[offset to __type_info@Base]
[offset to mangled name]
__type_info@RelatedImpl:
[offset to __type_info@Related]
[offset to mangled name]
Etc...
Invocation is (or might not be) Magic!
Depending on the method and what can be proven at the binding side, a virtual method call may be bound statically or dynamically.
A dynamic virtual method call will read the target function's address from the vtable pointed to by a __vfptr
member.
The ABI describes how functions are ordered in vtables. For example: They might be ordered by class, then lexicographically by mangled name (which includes information about const-ness, parameters, etc...). For single inheritance, this approach guarantees that a function's virtual dispatch index will always be the same, regardless of how many distinct implementations there are.
In the examples given here, destructors are placed at the beginning of each vtable, if applicable. If the destructor is trivial and non-virtual (not defined or does nothing), the compiler may elide it entirely, and not allocate a vtable entry for it.
Base *ptr = new Special{};
MoreDerived *md_ptr = new MoreDerived{};
// The cast below is checked statically, which would
// be a problem if "ptr" weren't pointing to a Special.
//
Special *sptr = static_cast<Special *>(ptr);
// In this case, it is possible to
// prove that "ptr" could point only to
// a Special, binding statically.
//
ptr->Helpful();
// Due to the cast above, a compiler might not
// care to prove that the pointed-to type
// cannot be anything but a Special.
//
// The call below might proceed as follows:
//
// reg = sptr->__vptr[__index_of@Base::Helpful] = &Special::Helpful in main.o
//
// push sptr
// call reg
// pop
//
// This will indirectly call Special::Helpful.
//
sptr->Helpful();
// No cast required: LSP is satisfied.
ptr = md_ptr;
// Once again:
//
// reg = ptr->__vfptr[__index_of@Base::Helpful] = &MoreDerived::Helpful in main.o
//
// push ptr
// call reg
// pop
//
// This will indirectly call MoreDerived::Helpful
//
ptr->Helpful();
The logic above is the same for any invocation site that requires dynamic binding. In the example above, it doesn't matter exactly what type ptr
or sptr
point to; the code will just load a pointer at a known offset, then blindly call it.
Type casting: Ups and Downs
All information about a type hierarchy must be available to the compiler when translating a cast or function call expression. Symbolically, casting is just a matter of traversing a directed graph.
Up-casting in this simple ABI can be performed entirely at compile time. The compiler needs only to examine the type hierarchy to determine if the source and target types are related (there is a path from the source to the target in the type graph). By the substitution principle, a pointer to a MoreDerived
also points to a Base
and can be interpreted as such. The __vfptr
member is at the same offset for all types in this hierarchy, so RTTI logic doesn't need to handle any special cases (in certain implementations of VMI, it would need to grab another offset from a type thunk to grab another vptr and so on...).
Down-casting, however, is different. Since casting from a base type to a derived type involves determining if the pointed-to object has a compatible binary layout, it is necessary to perform an explicit type check (conceptually, this is "proving" that the extra information exists beyond the end of the structure assumed at compile time).
Note that there are multiple vtable instances for the Derived
type: One in external.so
and one in main.o
. This is because a virtual method defined for Derived
(its destructor) appears in every translation unit that includes external.h
.
Even though the logic is identical in both cases, both images in this example need to have their own copy. This is why type checking cannot be performed using addresses alone.
A down-cast is then performed by walking a type graph (copied in both images) starting from the source type decoded at runtime, comparing mangled names until the compile-time target is matched.
For example:
Base *ptr = new MoreDerived();
// ptr->__vfptr = &__vft::MoreDerived in main.o
//
// This provides the code below with a starting point
// for dynamic cast graph traversals.
// All searches start with the type graph in the current image,
// then all other linked images, and so on...
// This example is not exhaustive!
// Starts by grabbing &__type_info@MoreDerived
// using the offset within __vft@MoreDerived resolved
// at load time.
//
// This is similar to a virtual method call: Just grab
// a pointer from a known offset within the table.
//
// Search path:
// __type_info@MoreDerived (match!)
//
auto *md_ptr = dynamic_cast<MoreDerived *>(ptr);
// Search path:
// __type_info@MoreDerived ->
// __type_info@Derived (match!)
//
auto *d_ptr = dynamic_cast<Derived *>(ptr);
// Search path:
// __type_info@MoreDerived ->
// __type_info@Derived ->
// __type_info@Base (no match)
//
// Did not find a path connecting RelatedImpl to MoreDerived.
//
// rptr will be nullptr
//
auto *rptr = dynamic_cast<RelatedImpl *>(ptr);
At no point in the code above did ptr->__vfptr
need to change. The static nature of type deduction in C++ requires the implementation to satisfy the substitution principle at compile time, meaning that the actual type of an object cannot change at runtime.
Summary
I've understood this question as one about the mechanisms behind dynamic dispatch.
To me, "Which entry in vtable refers to the function of "particular" derived classes which is supposed to be called at run time?", is asking how a vtable works.
This answer is intended to demonstrate that type casting affects only the view of an object's data, and that the implementation of dynamic dispatch in these examples operate independently of it. However, type casting does affect dynamic dispatch in the case of multiple inheritance, where determining which vtable to use may require multiple steps (an instance of a type with multiple bases may have multiple vptrs).
operator=
invocation. – Hispanicism