Virtual tables and memory layout in multiple virtual inheritance
Asked Answered
M

2

15

Consider following hierarchy:

struct A {
   int a; 
   A() { f(0); }
   A(int i) { f(i); }
   virtual void f(int i) { cout << i; }
};
struct B1 : virtual A {
   int b1;
   B1(int i) : A(i) { f(i); }
   virtual void f(int i) { cout << i+10; }
};
struct B2 : virtual A {
   int b2;
   B2(int i) : A(i) { f(i); }
   virtual void f(int i) { cout << i+20; }
};
struct C : B1, virtual B2 {
   int c;
   C() : B1(6),B2(3),A(1){}
   virtual void f(int i) { cout << i+30; }
};
  1. What's the exact memory layout of C instance? How many vptrs it contains, where exactly each of them is placed? Which of virtual tables are shared with virtual table of C? What exactly each virtual table contains?

    Here how I understand the layout:

    ----------------------------------------------------------------
    |vptr1 | AptrOfB1 | b1 | B2ptr | c | vptr2 | AptrOfB2 | b2 | a |
    ----------------------------------------------------------------
    

    where AptrOfBx is the pointer to A instance that Bx contains (since the inheritance is virtual).
    Is that correct? Which functions vptr1 points to? Which functions vptr2 points to?

  2. Given the following code

    C* c = new C();
    dynamic_cast<B1*>(c)->f(3);
    static_cast<B2*>(c)->f(3);
    reinterpret_cast<B2*>(c)->f(3);
    

    Why all the calls to f print 33?

Metallist answered 22/7, 2012 at 19:7 Comment(3)
Is this homework, or curiosity?Hippomenes
Actually this is exam. But I'm sure, if I finally understand how the things works in this example, I can understand anything which related to multiple inheritance and virtual inheritance.Metallist
You can easily find out the start of each parent subobject like this: C foo; intptr_t offsetB1 = (intptr_t)(B1*)&foo - (intptr_t)&foo;, the starts of the other bases can be derived analogously. Also, computing the sizeof of all classes should give you another good clue.Billibilliard
S
19

Virtual bases are very different from ordinary bases. Remember that "virtual" means "determined at runtime" -- thus the entire base subobject must be determined at runtime.

Imagine that you are getting a B & x reference, and you are tasked to find the A::a member. If the inheritance were real, then B has a superclass A, and thus the B-object which you are viewing through x has an A-subobject in which you can locate your member A::a. If the most-derived object of x has multiple bases of type A, then you can only see that particular copy which is the subobject of B.

But if the inheritance is virtual, none of this makes sense. We don't know which A-subobject we need -- this information simply doesn't exist at compile time. We could be dealing with an actual B-object as in B y; B & x = y;, or with a C-object like C z; B & x = z;, or something entirely different that derives virtually from A many more times. The only way to know is to find the actual base A at runtime.

This can be implemented with one more level of runtime indirection. (Note how this is entirely parallel to how virtual functions are implemented with one extra level of runtime indirection compared to non-virtual functions.) Instead of having a pointer to a vtable or base subobject, one solution is to store a pointer to a pointer to the actual base subobject. This is sometimes called a "thunk" or "trampoline".

So the actual object C z; may look as follows. The actual ordering in memory is up to the compiler and unimportant, and I've suppressed vtables.

+-+------++-+------++-----++-----+
|T|  B1  ||T|  B2  ||  C  ||  A  |
+-+------++-+------++-----++-----+
 |         |                 |
 V         V                 ^
 |         |       +-Thunk-+ |
 +--->>----+-->>---|     ->>-+
                   +-------+

Thus, no matter whether you have a B1& or a B2&, you first look up the thunk, and that one in turn tells you where to find the actual base subobject. This also explains why you cannot perform a static cast from an A& to any of the derived types: this information simply doesn't exist at compile time.

For a more in-depth explanation, take a look at this fine article. (In that description, the thunk is part of the vtable of C, and virtual inheritance always necessitates the maintenance of vtables, even if there are no virtual functions anywhere.)

Synchronism answered 22/7, 2012 at 21:28 Comment(6)
Thank you for the great answer. As I know, thunk is a part of virtual table. I.e. if you don't need an offset to get the object function is working on, you don't need a thunk. If you need an offset, so in the appropriate field of vtable there is a pointer to thunk, which contains the offset and the pointer to actual function. So I'm interested to know, how in my example are vtables looking. I.e. which of functions they are pointing to, and which of functions are pointed through the thunks.Metallist
As well, I'm very surprised that all the casts (static, dynamic, reinterpret) turns me to one specific function C::f. It's very strange. Can you explain please how (in this example ) each one of them works? Also, I've read many articles about this, and the article you linked was one of the first articles that I've read. It's still doesn't help me to understand what happens here.Metallist
@user1544364 "all the casts () turns me to one specific function" No. These casts return an object pointer, not a function.Intussusception
@user1544364 "thunk, which contains the offset and the pointer to actual function." No. A thunk does not contain data, a thunk consist of executable code. A thunk is just an optimised function.Intussusception
The link to "the fine article" is broken but I found the backup: cs.nyu.edu/courses/fall16/CSCI-UA.0470-001/slides/…Lowbrow
@Lowbrow That link is password protected ...Fibro
B
4

I have pimped your code a bit as follows:

#include <stdio.h>
#include <stdint.h>

struct A {
   int a; 
   A() : a(32) { f(0); }
   A(int i) : a(32) { f(i); }
   virtual void f(int i) { printf("%d\n", i); }
};

struct B1 : virtual A {
   int b1;
   B1(int i) : A(i), b1(33) { f(i); }
   virtual void f(int i) { printf("%d\n", i+10); }
};

struct B2 : virtual A {
   int b2;
   B2(int i) : A(i), b2(34) { f(i); }
   virtual void f(int i) { printf("%d\n", i+20); }
};

struct C : B1, virtual B2 {
   int c;
   C() : B1(6),B2(3),A(1), c(35) {}
   virtual void f(int i) { printf("%d\n", i+30); }
};

int main() {
    C foo;
    intptr_t address = (intptr_t)&foo;
    printf("offset A = %ld, sizeof A = %ld\n", (intptr_t)(A*)&foo - address, sizeof(A));
    printf("offset B1 = %ld, sizeof B1 = %ld\n", (intptr_t)(B1*)&foo - address, sizeof(B1));
    printf("offset B2 = %ld, sizeof B2 = %ld\n", (intptr_t)(B2*)&foo - address, sizeof(B2));
    printf("offset C = %ld, sizeof C = %ld\n", (intptr_t)(C*)&foo - address, sizeof(C));
    unsigned char* data = (unsigned char*)address;
    for(int offset = 0; offset < sizeof(C); offset++) {
        if(!(offset & 7)) printf("| ");
        printf("%02x ", (int)data[offset]);
    }
    printf("\n");
}

As you see, this prints quite a bit of additional information that allows us to deduce the memory layout. The output on my machine (a 64-bit linux, little endian byte order) is this:

1
23
16
offset A = 16, sizeof A = 16
offset B1 = 0, sizeof B1 = 32
offset B2 = 32, sizeof B2 = 32
offset C = 0, sizeof C = 48
| 00 0d 40 00 00 00 00 00 | 21 00 00 00 23 00 00 00 | 20 0d 40 00 00 00 00 00 | 20 00 00 00 00 00 00 00 | 48 0d 40 00 00 00 00 00 | 22 00 00 00 00 00 00 00 

So, we can describe the layout as follows:

+--------+----+----+--------+----+----+--------+----+----+
|  vptr  | b1 | c  |  vptr  | a  | xx |  vptr  | b2 | xx |
+--------+----+----+--------+----+----+--------+----+----+

Here, xx denotes padding. Note how the compiler has placed the variable c into the padding of its non-virtual base. Note also, that all three v-pointers are different, this allows the program to deduce the correct positions of all the virtual bases.

Billibilliard answered 17/7, 2014 at 20:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.