First, a polymorphic class has at least one virtual function, so it has a vptr:
struct A {
virtual void foo();
};
is compiled to:
struct A__vtable { // vtable for objects of declared type A
void (*foo__ptr) (A *__this); // pointer to foo() virtual function
};
void A__foo (A *__this); // A::foo ()
// vtable for objects of real (dynamic) type A
const A__vtable A__real = { // vtable is never modified
/*foo__ptr =*/ A__foo
};
struct A {
A__vtable const *__vptr; // ptr to const not const ptr
// vptr is modified at runtime
};
// default constructor for class A (implicitly declared)
void A__ctor (A *__that) {
__that->__vptr = &A__real;
}
Remark: C++ can be compiled to another high level language like C (as cfront did) or even to a C++ subset (here C++ without virtual
). I put __
in compiler generated names.
Note that this is a simplistic model where RTTI is not supported; real compilers will add data in the vtable to support typeid
.
Now, a simple derived class:
struct Der : A {
override void foo();
virtual void bar();
};
Non-virtual (*) base class subobjects are subobjects like member subobjects, but while member subobjects are complete objects, ie. their real (dynamic) type is their declared type, base class subobjects are not complete, and their real type change during construction.
(*) virtual bases are very different, like virtual member functions are different from non virtual members
struct Der__vtable { // vtable for objects of declared type Der
A__vtable __primary_base; // first position
void (*bar__ptr) (Der *__this);
};
// overriding of a virtual function in A:
void Der__foo (A *__this); // Der::foo ()
// new virtual function in Der:
void Der__bar (Der *__this); // Der::bar ()
// vtable for objects of real (dynamic) type Der
const Der__vtable Der__real = {
{ /*foo__ptr =*/ Der__foo },
/*foo__ptr =*/ Der__bar
};
struct Der { // no additional vptr
A __primary_base; // first position
};
Here "first position" means that the member must be first (other members could be re-ordered): they are located at offset zero so we can reinterpret_cast
pointers, the types are compatibles; at non-zero offset, we would have to do pointer adjustments with arithmetic on char*
.
The lack of adjustment may not seem like a big deal in term of generated code (just some add immediate asm instructions), but it means much more than that, it means that such pointers can be viewed as having different types: an object of type A__vtable*
can contain a pointer to Der__vtable
and be treated as either a Der__vtable*
or a A__vtable*
. The same pointer object serves as a pointer to a A__vtable
in functions dealing with objects of type A
and as a pointer to a Der__vtable
in functions dealing with objects of type Der
.
// default constructor for class Der (implicitly declared)
void Der__ctor (Der *__this) {
A__ctor (reinterpret_cast<A*> (__this));
__this->__vptr = reinterpret_cast<A__vtable const*> (&Der__real);
}
You see that the dynamic type, as defined by the vptr, changes during construction as we assign a new value to the vptr (in this particular case the call to the base class constructor does nothing useful and can be optimised away, but it isn't the case with non trivial constructors).
With multiple inheritance:
struct C : A, B {};
A C
instance will contain a A
and a B
, like that:
struct C {
A base__A; // primary base
B base__B;
};
Note that only one of these base class subobjects can have the privilege of sitting at offset zero; this is important in many ways:
conversion of pointers to other base classes (upcasts) will need an
adjustment; conversely, upcasts need the opposite adjustments;
this implies that when doing a virtual call with a base class
pointer, the this
has the correct value for entry in the derived
class overrider.
So the following code:
void B::printaddr() {
printf ("%p", this);
}
void C::printaddr () { // overrides B::printaddr()
printf ("%p", this);
}
can be compiled to
void B__printaddr (B *__this) {
printf ("%p", __this);
}
// proper C::printaddr taking a this of type C* (new vtable entry in C)
void C__printaddr (C *__this) {
printf ("%p", __this);
}
// C::printaddr overrider for B::printaddr
// needed for compatibility in vtable
void C__B__printaddr (B *__this) {
C__printaddr (reinterpret_cast<C*>(reinterpret_cast<char*> (__this) - offset__C__B));
}
We see the C__B__printaddr
declared type and semantics are compatibles with B__printaddr
, so we can use &C__B__printaddr
in the vtable of B
; C__printaddr
is not compatible but can be used for calls involving a C
objects, or classes derived from C
.
A non virtual member function is like a free function that has access to internal stuff. A virtual member function is "flexibility point" which can be customised by overriding. virtual member function declaration play a special role in the definition of a class: like other members they are part of the contract with the external world, but at the same time they are part of a contract with derived class.
A non virtual base class is like a member object where we can refine behaviour via overriding (also we can access protected members). For the external world, the inheritance for A
in Der
implies that implicit derived-to-base conversions will exist for pointers, that a A&
can be bound to a Der
lvalue, etc. For further derived classes (derived from Der
), it also means that virtual functions of A
are inherited in the Der
: virtual functions in A
can be overridden in further derived classes.
When a class is further derived, say Der2
is derived from Der
, implicit conversions a pointers of type Der2*
to A*
is semantically performed in step: first, a conversion to Der*
is validated (the access control to the inheritance relation of Der2
from Der
is checked with the usual public/protected/private/friend rules), then the access control of Der
to A
. A non virtual inheritance relation cannot be refined or overridden in derived classes.
Non virtual members functions can be directly called and virtual members must be called indirectly via the vtable (unless the real object type happens to be known by the compiler), so the virtual
keyword adds an indirection to members functions access. Just like for function members, the virtual
keyword adds an indirection to base object access; just like for functions, virtual base classes add a flexibility point in inheritance.
When doing non-virtual, repeated, multiple inheritance :
struct Top { int i; };
struct Left : Top { };
struct Right : Top { };
struct Bottom : Left, Right { };
There are just two Top::i
subobjects in Bottom
(Left::i
and Right::i
), as with members objects:
struct Top { int i; };
struct mLeft { Top t; };
struct mRight { mTop t; };
struct mBottom { mLeft l; mRight r; }
Nobody is surprised that there are two int
sub-members (l.t.i
and r.t.i
).
With virtual functions:
struct Top { virtual void foo(); };
struct Left : Top { }; // could override foo
struct Right : Top { }; // could override foo
struct Bottom : Left, Right { }; // could override foo (both)
it means there are two different (unrelated) virtual functions called foo
, with distinct vtable entries (both as they have the same signature, they can have a common overrider).
The semantic of non virtual base classes follows from the fact that basic, non virtual, inheritance is an exclusive relation: the inheritance relation established between Left and Top cannot be modified by a further derivation, so the fact that a similar relation exists between Right
and Top
cannot affect this relation. In particular, it means that Left::Top::foo()
can be overridden in Left
and in Bottom
, but Right
, which has no inheritance relation with Left::Top
, cannot set this customisation point.
Virtual base classes are different: a virtual inheritance is a shared relation that can be customised in derived classes:
struct Top { int i; virtual void foo(); };
struct vLeft : virtual Top { };
struct vRight : virtual Top { };
struct vBottom : vLeft, vRight { };
Here, this is only one base class subobject Top
, only one int
member.
Implementation:
Room for non virtual base classes are allocated based on a static layout with fixed offsets in the derived class. Note that the layout of a derived class is the included in the layout of more derived class, so the exact position of subobjects does not depend on the real (dynamic) type of object (just like the address of a non virtual function is a constant). OTOH, the position of subobjects in a class with virtual inheritance is determined by the dynamic type (just like the address of the implementation of a virtual function is known only when the dynamic type is known).
The location of subobject will be determined at runtime with the vptr and the vtable (reuse of the existing vptr implies less space overhead), or a direct internal pointer to the subobject (more overhead, less indirections needed).
Because the offset of a virtual base class is determined only for a complete object, and cannot be known for a given declared type, a virtual base cannot be allocated at offset zero and is never a primary base. A derived class will never reuse the vptr of a virtual base as its own vptr.
In term of possible translation:
struct vLeft__vtable {
int Top__offset; // relative vLeft-Top offset
void (*foo__ptr) (vLeft *__this);
// additional virtual member function go here
};
// this is what a subobject of type vLeft looks like
struct vLeft__subobject {
vLeft__vtable const *__vptr;
// data members go here
};
void vLeft__subobject__ctor (vLeft__subobject *__this) {
// initialise data members
}
// this is a complete object of type vLeft
struct vLeft__complete {
vLeft__subobject __sub;
Top Top__base;
};
// non virtual calls to vLeft::foo
void vLeft__real__foo (vLeft__complete *__this);
// virtual function implementation: call via base class
// layout is vLeft__complete
void Top__in__vLeft__foo (Top *__this) {
// inverse .Top__base member access
char *cp = reinterpret_cast<char*> (__this);
cp -= offsetof (vLeft__complete,Top__base);
vLeft__complete *__real = reinterpret_cast<vLeft__complete*> (cp);
vLeft__real__foo (__real);
}
void vLeft__foo (vLeft *__this) {
vLeft__real__foo (reinterpret_cast<vLeft__complete*> (__this));
}
// Top vtable for objects of real type vLeft
const Top__vtable Top__in__vLeft__real = {
/*foo__ptr =*/ Top__in__vLeft__foo
};
// vLeft vtable for objects of real type vLeft
const vLeft__vtable vLeft__real = {
/*Top__offset=*/ offsetof(vLeft__complete, Top__base),
/*foo__ptr =*/ vLeft__foo
};
void vLeft__complete__ctor (vLeft__complete *__this) {
// construct virtual bases first
Top__ctor (&__this->Top__base);
// construct non virtual bases:
// change dynamic type to vLeft
// adjust both virtual base class vptr and current vptr
__this->Top__base.__vptr = &Top__in__vLeft__real;
__this->__vptr = &vLeft__real;
vLeft__subobject__ctor (&__this->__sub);
}
For an object of known type, access to the base class is through vLeft__complete
:
struct a_vLeft {
vLeft m;
};
void f(a_vLeft &r) {
Top &t = r.m; // upcast
printf ("%p", &t);
}
is translated to:
struct a_vLeft {
vLeft__complete m;
};
void f(a_vLeft &r) {
Top &t = r.m.Top__base;
printf ("%p", &t);
}
Here the real (dynamic) type of r.m
is known and so is the relative position of the subobject is known at compile time. But here:
void f(vLeft &r) {
Top &t = r; // upcast
printf ("%p", &t);
}
the real (dynamic) type of r
is not known, so the access is through the vptr:
void f(vLeft &r) {
int off = r.__vptr->Top__offset;
char *p = reinterpret_cast<char*> (&r) + off;
printf ("%p", p);
}
This function can accept any derived class with a different layout:
// this is what a subobject of type vBottom looks like
struct vBottom__subobject {
vLeft__subobject vLeft__base; // primary base
vRight__subobject vRight__base;
// data members go here
};
// this is a complete object of type vBottom
struct vBottom__complete {
vBottom__subobject __sub;
// virtual base classes follow:
Top Top__base;
};
Note that the vLeft
base class is at a fixed location in a vBottom__subobject
, so vBottom__subobject.__ptr
is used as a vptr for the whole vBottom
.
Semantics:
The inheritance relation is shared by all derived classes; this means that the right to override is shared, so vRight
can override vLeft::foo
. This creates a sharing of responsibilities: vLeft
and vRight
must agree on how they customise Top
:
struct Top { virtual void foo(); };
struct vLeft : virtual Top {
override void foo(); // I want to customise Top
};
struct vRight : virtual Top {
override void foo(); // I want to customise Top
};
struct vBottom : vLeft, vRight { }; // error
Here we see a conflict: vLeft
and vRight
seek to define the behaviour of the only foo virtual function, and vBottom
definition is in error for lack of a common overrider.
struct vBottom : vLeft, vRight {
override void foo(); // reconcile vLeft and vRight
// with a common overrider
};
Implementation:
The construction of class with non virtual base classes with non virtual base classes involves calling base class constructors in the same order as done for member variables, changing the dynamic type each time we enter a ctor. During construction, the base class subobjects really act as if they were complete objects (this is even true with impossible complete abstract base class subobjects: they are objects with undefined (pure) virtual functions). Virtual functions and RTTI can be called during construction (except of course pure virtual functions).
The construction of a class with non virtual base classes with virtual bases is more complicated: during construction, the dynamic type is the base class type, but the layout of virtual base is still the layout of the most derived type that is not yet constructed, so we need more vtables to describe this state:
// vtable for construction of vLeft subobject of future type vBottom
const vLeft__vtable vLeft__ctor__vBottom = {
/*Top__offset=*/ offsetof(vBottom__complete, Top__base),
/*foo__ptr =*/ vLeft__foo
};
The virtual functions are those of vLeft
(during construction, the vBottom object lifetime has not begun), while the virtual base locations are those of a vBottom
(as defined in the vBottom__complete
translated objected).
Semantics:
During initialisation, it is obvious that we must be careful not to use an object before it is initialised. Because C++ gives us a name before an object is fully initialised, it is easy to do that:
int foo (int *p) { return *pi; }
int i = foo(&i);
or with the this pointer in the constructor:
struct silly {
int i;
std::string s;
static int foo (bad *p) {
p->s.empty(); // s is not even constructed!
return p->i; // i is not set!
}
silly () : i(foo(this)) { }
};
It is pretty obvious that any use of this
in the ctor-init-list must be carefully checked. After initialisation of all members, this
can be passed to other functions and registered in some set (until destruction begins).
What is less obvious is that when construction of a class involving shared virtual bases, subobjects stop being constructed: during the construction of a vBottom
:
first the virtual bases are constructed: when Top
is constructed, it is constructed like a normal subject (Top
doesn't even know it is virtual base)
then the base classes are constructed in left to right order: the vLeft
subobject is constructed and becomes functional as a normal vLeft
(but with a vBottom
layout), so the Top
base class subobject now has a vLeft
dynamic type;
the vRight
subobject construction begins, and the dynamic type of the base class changes to vRight; but vRight
isn't derived from vLeft
, doesn't know anything about vLeft
, so the vLeft
base is now broken;
when the body of the Bottom
constructor begins, the types of all subobjects have stabilised and vLeft
is functional again.