Mismatch of 'this' address when base class is not polymorphic but derived is
Asked Answered
B

3

10

There is this code:

#include <iostream>

class Base
{
public:
    Base() {
        std::cout << "Base: " << this << std::endl;
    }
    int x;
    int y;
    int z;
};

class Derived : Base
{
public:
    Derived() {
        std::cout << "Derived: " << this << std::endl;
    }

    void fun(){}
};

int main() {
   Derived d;
   return 0;
}

The output:

Base: 0xbfdb81d4
Derived: 0xbfdb81d4

However when function 'fun' is changed to virtual in Derived class:

virtual void fun(){} // changed in Derived

Then address of 'this' is not the same in both constructors:

Base: 0xbf93d6a4
Derived: 0xbf93d6a0

The other thing is if class Base is polymorphic, for example I added there some other virtual function:

virtual void funOther(){} // added to Base

then addresses of both 'this' match again:

Base: 0xbfcceda0
Derived: 0xbfcceda0

The question is - why 'this' address is different in Base and Derived class when Base class is not polymorphic and Derived class is?

Bore answered 21/7, 2012 at 16:12 Comment(4)
My guess is first one is case of empty base class optimization and second one is due to vptr present in derived classTranspadane
The difference is only of 4 bytes, the size of a pointer maybe. Adding the fun function forces to have a pointer to it (well, not exactly to it but doesn't matter) in Derived that isn't visible in Base, hence the difference. So it might be related to the presence of the vtable in Derived (en.wikipedia.org/wiki/Virtual_method_table)Oiler
@Mr.Anubis: But where is the "empty base class" in this case? The only base class is Base and it is never empty.Beer
@AndreyT I thought the same after some time posting the comment (Should've just called it optimization I guess). So idiot of me :DTranspadane
B
15

When you have a polymorphic single-inheritance hierarchy of classes, the typical convention followed by most (if not all) compilers is that each object in that hierarchy has to begin with a VMT pointer (a pointer to Virtual Method Table). In such case the VMT pointer is introduced into the object memory layout early: by the root class of the polymorphic hierarchy, while all lower classes simply inherit it and set it to point to their proper VMT. In such case all nested subobjects within any derived object have the same this value. That way by reading a memory location at *this the compiler has immediate access to VMT pointer regardless of the actual subobject type. This is exactly what happens in your last experiment. When you make the root class polymorphic, all this values match.

However, when the base class in the hierarchy is not polymorphic, it does not introduce a VMT pointer. The VMT pointer will be introduced by the very first polymorphic class somewhere lower in the hierarchy. In such case a popular implementational approach is to insert the VMT pointer before the data introduced by the non-polymorphic (upper) part of the hierarchy. This is what you see in your second experiment. The memory layout for Derived looks as follows

+------------------------------------+ <---- `this` value for `Derived` and below
| VMT pointer introduced by Derived  |
+------------------------------------+ <---- `this` value for `Base` and above
| Base data                          |
+------------------------------------+
| Derived data                       |
+------------------------------------+

Meanwhile, all classes in the non-polymorphic (upper) part of the hierarchy should know nothing about any VMT pointers. Objects of Base type must begin with data field Base::x. At the same time all classes in the polymorphic (lower) part of the hierarchy must begin with VMT pointer. In order to satisfy both of these requirements, the compiler is forced to adjust the object pointer value as it is converted up and down the hierarchy from one nested base subobject to another. That immediately means that pointer conversion across the polymorphic/non-polymorphic boundary is no longer conceptual: the compiler has to add or subtract some offset.

The subobjects from non-polymorphic part of the hierarchy will share their this value, while subobjects from the polymorphic part of hierarchy will share their own, different this value.

Having to add or subtract some offset when converting pointer values along the hierarchy is not unusual: the compiler has to do it all the time when dealing with multiple-inheritance hierarchies. However, you example shows how it can be achieved in single-inheritance hierarchy as well.

The addition/subtraction effect will also be revealed in a pointer conversion

Derived *pd = new Derived;
Base *pb = pd; 
// Numerical values of `pb` and `pd` are different if `Base` is non-polymorphic
// and `Derived` is polymorphic

Derived *pd2 = static_cast<Derived *>(pb);
// Numerical values of `pd` and `pd2` are the same
Beer answered 21/7, 2012 at 16:25 Comment(5)
In the case of comparing pointers (to the same object) of a non-polymorphic base and a polymorphic derived class object, the pointers compare equal as expected. How does this work? Example: In the second case described above, I am comparing a Base* to a Derived*.Maxilliped
@Agnel Kurian: I'm not sure I understand what you are talking about. In the OP's implementation the pointers in Base *pb = pd; case should differ numerically if Base is not polymorphic and Derived is polymorphic. (One has to make sure the inheritance is public. Otherwise it simply won't compile.)Beer
In the second case (non-virtual Base and virtual Derived classes), if I create an object d of Derived and store its address as Base *pb = &d; and Derived *pd = &d and then compare the pointers as (pb == pd), the comparison returns true. How does this work when the this pointers are different?Maxilliped
@Agnel Kurian: When you do pb == pd comparison, the compiler sees that the pointer types are different. The language says that Base * has to be used as a "common" type for comparison, i.e. that pd in this case have to be converted to Base * type. In other words, your pb == pd is interpreted as pb == (Base *) pd. The conversion is what adjusts the right-hand side pointer before comparison. You are not really comparing the "numerical" values of the pointers by pb == pd.Beer
@Agnel Kurian: Try this in your example: uintptr_t nb = (uintptr_t) pb, nd = (uintptr_t) pd; and then print the nb and nd values. You will see that they are different. Yet pb == pd will still return true.Beer
R
6

This looks like behavior of a typical implementation of polymorphism with a v-table pointer in the object. The Base class doesn't require such a pointer since it doesn't have any virtual methods. Which saves 4 bytes in the object size on a 32-bit machine. A typical layout is:

+------+------+------+
|   x  |   y  |   z  |
+------+------+------+

    ^
    | this

The Derived class however does require the v-table pointer. Typically stored at offset 0 in the object layout.

+------+------+------+------+
| vptr |   x  |   y  |   z  |
+------+------+------+------+

    ^
    | this

So to make the Base class methods see the same layout of the object, the code generator adds 4 to the this pointer before calling a method of the Base class. The constructor sees:

+------+------+------+------+
| vptr |   x  |   y  |   z  |
+------+------+------+------+
           ^
           | this

Which explains why you see 4 added to the this pointer value in the Base constructor.

Ronrona answered 21/7, 2012 at 16:35 Comment(1)
this is very interesting. so let's say we are using placement new on a polymorphic hierarchy, with some non polymorphic bases (also included). Our calculation for the given address is just the maximum alignment requirement, but we don't bother to save the value returned by the placement new. Can we safely reinterpret_cast our memory address to any T* of the parent hierarchy ?Immingle
H
1

Technically speaking, this is exactly what happens.

However it must be noted that according to the language specification, the implementation of polymorphism does not necessarily relate to vtables: this is what the spec. defines as "implementation detail", that is out of the specs scope.

All what we can say is that this has a type, and points to what is accessible through its type. How the dereferencing into members happens, again, is an implementation detail.

The fact that a pointer to something when converted into a pointer to something else, either by implicit, static or dynamic conversion, has to be changed to accommodate what is around must be considered the rule, not the exception.

By the way C++ is defined, the question is meaningless, as are the answers, since they assume implicitly that the implementation is based on the supposed layouts.

The fact that, under given circumstances, two object sub-components share a same origin, is just a (very common) particular case.

The exception is "reinterpreting": when you "blind" the type system, and just say "look this bunch of bytes as they are an instance of this type": that's the only case you have to expect no address change (and no responsibility from the compiler about the meaningfulness of such a conversion).

Hepatitis answered 21/7, 2012 at 17:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.