How does the compiler know which entry in vtable corresponds to a virtual function?
Asked Answered
P

2

7

Let's say we have more than one virtual function in the parent class and derived class. There will be a vtable created for these virtual functions in the vtable for both the parent derived class.

How will the compiler know which entry in the vtable correspond to which virtual function?

Example:

class Animal{
public:
 void fakeMethod1(){}
 virtual void getWeight(){}
 void fakeMethod2(){}
 virtual void getHeight(){}
 virtual void getType(){}
};

class Tiger:public Animal{
public:
 void fakeMethod3(){}
 virtual void getWeight(){}
 void fakeMethod4(){}
 virtual void getHeight(){}
 virtual void getType(){}
};
main(){
Animal a* = new Tiger();
a->getHeight(); // A  will now point to the base address of vtable Tiger
//How will the compiler know which entry in the vtable corresponds to the function getHeight()?
}

I have not found exact explanation in my research -

https://mcmap.net/q/166290/-how-are-virtual-functions-and-vtable-implemented =

"This table is used to resolve the function call as it contains the addresses of all the virtual functions of that class."

How exactly is the table used to resolve the function call?

https://mcmap.net/q/957661/-how-does-the-c-compiler-know-which-implementation-of-a-virtual-function-to-call =

"So at runtime, the code just uses the object's vptr to locate the vtbl, and from there the address of the actual overridden function."

I am not able to understand this. Vtable holds the address of the virtual function not the address of actual overridden function.

Prosecutor answered 8/10, 2015 at 2:57 Comment(6)
There's nothing that prescribes the vtbl layout. But a natural way is for the compiler to number the virtual functions in the class in consecutive order. These numbers serve as indices into the vtbl, which is effectively an array of function pointers.Knowling
The compiler knows what's in the vtable because it created the vtable. Unclear what you're really asking here.Adenovirus
@Knowling annoyingly, MSVC also groups overloads together even if they aren't declared in that order. (Oh, and naturally things get strange with virtual inheritance etc)Meris
@EJP I am not able to understand how the compiler maps the virtual functions in the class to the entries in vtable. Hope this makes my question more clear.Prosecutor
But it doesn't matter, does it? It puts entries into the vtable, and it generates code to read them out again. What mapping it uses is irrelevant. Stroustrup used the order of declaration, but it could be anything as long as it was consistent.Adenovirus
NB "VTable holds the address of the virtual function not the address of actual overridden function" isn't correct. It holds the address of the override for the current class. Otherwise it would be pointless.Adenovirus
R
10

I'll modify your example a little so it shows more interesting aspects of object orientation.

Suppose we have the following:

#include <iostream>

struct Animal
{
  int age;
  Animal(int a) : age {a} {}
  virtual int setAge(int);
  virtual void sayHello() const;
};

int
Animal::setAge(int a)
{
  int prev = this->age;
  this->age = a;
  return prev;
}

void
Animal::sayHello() const
{
  std::cout << "Hello, I'm an " << this->age << " year old animal.\n";
}

struct Tiger : Animal
{
  int stripes;
  Tiger(int a, int s) : Animal {a}, stripes {s} {}
  virtual void sayHello() const override;
  virtual void doTigerishThing();
};

void
Tiger::sayHello() const
{
  std::cout << "Hello, I'm a " << this->age << " year old tiger with "
            << this->stripes << " stripes.\n";
}

void
Tiger::doTigerishThing()
{
  this->stripes += 1;
}


int
main()
{
  Tiger * tp = new Tiger {7, 42};
  Animal * ap = tp;
  tp->sayHello();         // call overridden function via derived pointer
  tp->doTigerishThing();  // call child function via derived pointer
  tp->setAge(8);          // call parent function via derived pointer
  ap->sayHello();         // call overridden function via base pointer
}

I'm ignoring the good advice that classes with virtual function members should have a virtual destructor for the purpose of this example. I'm going to leak the object anyway.

Let's see how we can translate this example into good old C where there are no member functions, leave alone with virtual ones. All of the following code is C, not C++.

The struct animal is simple:

struct animal
{
  const void * vptr;
  int age;
};

In addition to the age member, we have added a vptr that will be the pointer to the vtable. I'm using a void pointer for this because we'll have to do ugly casts anyway and using void * reduces the ugliness a little.

Next, we can implement the member functions.

static int
animal_set_age(void * p, int a)
{
  struct animal * this = (struct animal *) p;
  int prev = this->age;
  this->age = a;
  return prev;
}

Note the additional 0-th argument: the this pointer that is passed implicitly in C++. Again, I'm using a void * pointer as it will simplify things later on. Note that inside any member function, we always know the type of the this pointer statically so the cast is no problem. (And at the machine level, it doesn't do anything at all anyways.)

The sayHello member is defined likewise except that the this pointer is const qualified this time.

static void
animal_say_hello(const void * p)
{
  const struct animal * this = (const struct animal *) p;
  printf("Hello, I'm an %d year old animal.\n", this->age);
}

Time for the animal vtable. First we have to give it a type, which is straight-forward.

struct animal_vtable_type
{
  int (*setAge)(void *, int);
  void (*sayHello)(const void *);
};

Then we create a single instance of the vtable and set it up with the correct member functions. If Animal had have a pure virtual member, the corresponding entry would have a NULL value and were better not dereferenced.

static const struct animal_vtable_type animal_vtable = {
  .setAge = animal_set_age,
  .sayHello = animal_say_hello,
};

Note that animal_set_age and animal_say_hello were declared static. That's onkay because they will never be referred to by-name but only via the vtable (and the vtable only via the vptr so it can be static too).

We can now implement the constructor for Animal

void
animal_ctor(void * p, int age)
{
  struct animal * this = (struct animal *) p;
  this->vptr = &animal_vtable;
  this->age = age;
}

…and the corresponding operator new:

void *
animal_new(int age)
{
  void * p = malloc(sizeof(struct animal));
  if (p != NULL)
    animal_ctor(p, age);
  return p;
}

About the only thing interesting is the line where the vptr is set in the constructor.

Let's move on to tigers.

Tiger inherits from Animal so it gets a struct tiger sub-object. I'm doing this by placing a struct animal as the first member. It is essential that this is the first member because it means that the first member of that object – the vptr – has the same address as our object. We'll need this later when we'll do some tricky casting.

struct tiger
{
  struct animal base;
  int stripes;
};

We could also have simply copied the members of struct animal lexically at the beginning of the definition of struct tiger but that might be harder to maintain. A compiler doesn't care about such stylistic issues.

We already know how to implement the member functions for tigers.

void
tiger_say_hello(const void * p)
{
  const struct tiger * this = (const struct tiger *) p;
  printf("Hello, I'm an %d year old tiger with %d stripes.\n",
         this->base.age, this->stripes);
}

void
tiger_do_tigerish_thing(void * p)
{
  struct tiger * this = (struct tiger *) p;
  this->stripes += 1;
}

Note that we are casting the this pointer to struct tiger this time. If a tiger function is called, the this pointer had better point to a tiger, even if we are called through a base pointer.

Next to the vtable:

struct tiger_vtable_type
{
  int (*setAge)(void *, int);
  void (*sayHello)(const void *);
  void (*doTigerishThing)(void *);
};

Note that the first two members are exactly the same as for animal_vtable_type. This is essential and basically the the direct answer to your question. It would have been more explicit, perhaps, if I had placed a struct animal_vtable_type as the first member. I want to emphasize that the object layout would have been exactly the same except that we couldn't play our nasty casting tricks in this case. Again, these are aspects of the C language, not present at machine level so a compiler is not bothered by this.

Create a vtable instance:

static const struct tiger_vtable_type tiger_vtable = {
  .setAge = animal_set_age,
  .sayHello = tiger_say_hello,
  .doTigerishThing = tiger_do_tigerish_thing,
};

And implement the constructor:

void
tiger_ctor(void * p, int age, int stripes)
{
  struct tiger * this = (struct tiger *) p;
  animal_ctor(this, age);
  this->base.vptr = &tiger_vtable;
  this->stripes = stripes;
}

The first thing the tiger constructor does is calling the animal constructor. Remember how the animal constructor sets the vptr to &animal_vtable? This is the reason why calling virtual member functions from a base class constructor ofter surprises people. Only after the base class constructor has run, we re-assign the vptr to the derived type and then do our own initialization.

operator new is just boilerplate.

void *
tiger_new(int age, int stripes)
{
  void * p = malloc(sizeof(struct tiger));
  if (p != NULL)
    tiger_ctor(p, age, stripes);
  return p;
}

We're done. But how do we call a virtual member function? For this, I'll define a helper macro.

#define INVOKE_VIRTUAL_ARGS(STYPE, THIS, FUNC, ...)                     \
  (*((const struct STYPE ## _vtable_type * *) (THIS)))->FUNC( THIS, __VA_ARGS__ )

Now, this is ugly. What it does is taking the static type STYPE, a this pointer THIS and the name of the member function FUNC and any additional arguments to pass to the function.

Then, it constructs the type name of the vtable from the static type. (The ## is the preprocessor's token pasting operator. For example, if STYPE is animal, then STYPE ## _vtable_type will expand to animal_vtable_type.)

Next, the THIS pointer is casted to a pointer to a pointer to the just derived vtable type. This works because we've made sure to put the vptr as the first member in every object so it has the same address. This is essential.

Once this is done, we can dereference the pointer (to get the actual vptr) and then ask for its FUNC member and finally call it. (__VA_ARGS__ expands to the additional variadic macro arguments.) Note that we also pass the THIS pointer as the 0-th argument to the member function.

Now, the acatual truth is that I had to define an almost identical macro again for functions that take no arguments because the preprocessor does not allow a variadic macro argument pack to be empty. So shall it be.

#define INVOKE_VIRTUAL(STYPE, THIS, FUNC)                               \
  (*((const struct STYPE ## _vtable_type * *) (THIS)))->FUNC( THIS )

And it works:

#include <stdio.h>
#include <stdlib.h>

/* Insert all the code from above here... */

int
main()
{
  struct tiger * tp = tiger_new(7, 42);
  struct animal * ap = (struct animal *) tp;
  INVOKE_VIRTUAL(tiger, tp, sayHello);
  INVOKE_VIRTUAL(tiger, tp, doTigerishThing);
  INVOKE_VIRTUAL_ARGS(tiger, tp, setAge, 8);
  INVOKE_VIRTUAL(animal, ap, sayHello);
  return 0;
}

You might be wondering what happens in the

INVOKE_VIRTUAL_ARGS(tiger, tp, setAge, 8);

call. What we are doing is to invoke the non-overridden setAge member of Animal on a Tiger object referred to via a struct tiger pointer. This pointer is first implicitly casted to a void pointer and as such passed as the this pointer to animal_set_age. That function then casts it to a struct animal pointer. Is this correct? It is, because we were careful to put the struct animal as the very first member in struct tiger so the address of the struct tiger object is the same as the address for the struct animal sub-object. It's the same trick (only one level less) we were playing with the vptr.

Reasoned answered 8/10, 2015 at 5:37 Comment(4)
Thanks for the detailed explanation. This helped. Also found another example. Here they talk about an example with two base classes and the derived classes over riding a virtual function from both the base classes. linkProsecutor
For your next class, please cover virtual inheritance and how to emulate it in C. :) +1 As an aside, an advantage of this system is you can divorce the vtable from the data (don't store it contiguously), which can permit some tricks. These techniques are useful in C++ when you need to do that splitting (say, if you want to store your data in an internal buffer somewhere, but act on it polymorphically: or, tiny lightweight objects packed in one array, with run-length-encoded vtables elsewhere, which you might use in text processing)Meris
Thank you very much for the answer. A question: How does the compiler know in the first place that it has to place a virtual function call? If I had: X* z = new z(); Such that X has no virtual functions, Y inherits from X and has a virtual function, and Z inherits from Y. All function calls from "z" above should be through a vptr, but the type is X*, so how does the compiler know if calls to functions on "z" should pass through a vptr or not?Mushy
@DeanLeitersdorf In your example, the static type of the variable will be X*. And since (you said that) X has no virtual functions, there is none you could possibly call. Where this becomes a problem is when (in C++) you use delete to free the object again. Indeed, the compiler has no way to know at this point that it should go through a vtable in order to find the destructor of the dynamic type. What will happen in C++ is that the destructor of the static type (X) will be called and your program will invoke undefined behavior. tl;dr Don't do it.Reasoned
M
2

It can help to implement something similar yourself.

struct Bob;
struct Bob_vtable {
  void(*print)(Bob const*self) = 0;
  Bob_vtable(void(*p)(Bob const*)):print(p){}
};
template<class T>
Bob_vtable const* make_bob_vtable(void(*print)(Bob const*)) {
  static Bob_vtable const table(+print);
  return &table;
}
struct Bob {
  Bob_vtable const* vtable;
  void print() const {
    vtable->print(this);
  }
  Bob():vtable( make_bob_vtable<Bob>([](Bob const*self){
    std::cout << "Bob\n";
  })) {}
protected:
  Bob(Bob_vtable const* t):vtable(t){}
};
struct Alice:Bob {
  int x = 0;
  Alice():Bob( make_bob_vtable<Alice>([](Bob const*self){
    std::cout << "Alice " << static_cast<Alice const*>(self)->x << '\n';
  })) {}
};

live example.

Here we have an explicit vtable stored in Bob. It points to a table of functions. The non-virtual member function print uses it to dynamically dispatch to the correct method.

The constructor of Bob and derived class Alice set the vtable to a different value (created as a static local in this case) with different values in the table.

Which pointer to use is baked into the definition of what Bob::print means -- it knows the offset into the table.

If we add another virtual function in Alice, it just means that the vtable pointer will point to a struct Alice_vtable:Bob_vtable in actuality. Static/reinterpret casting will get us the "real" table out, and we can access the extra function pointers easily.

Things get stranger when we talk about virtual inheritance as well as virtual functions. I am not qualified to describe how that works.

Meris answered 8/10, 2015 at 3:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.