virtual
means, "This is NOT REALLY a C function, i.e a series of pushes of arguments onto the stack, followed by a jump to a SINGLE unchanging address of the function body."
Instead, it's this other beast that looks in a table at runtime for the address of the function body to execute. Each class in the hierarchy has an entry in that table. The table of function pointers is called a vtable. This is a RUNTIME mechanism for polymorphism that injects extra code to do this lookup and then dispatch to the appropriate specialized version of the function body.
Furthermore, when using this vtable dispatch mechanism, you always access your object through a POINTER to the object, as opposed to direct access (variable or reference) to it, ie. Foo* foo{makeFoo()}; foo->someMethod()
vs. Loo loo{}; loo.someMethod()
. So another dereference right from the get go is required to use this technique.
Here's the neat part: these pointers can point to any objects of derived classes as well, so if you have a class FooChild
that inherits from FooParent
, you can use a FoodParent *
to point to a FooParent
OR a FooChild
.
When the call is made to the method, instead of just doing the normal C thing of preparing the arguments on the stack, then jumping to the body of barMethod()
, it does a bunch of runtime work first to look up one of SEVERAL DIFFERENT implementations of barMethod that are individualized per class. That table is called the vtable. Each class in the class hierarchy has an entry in this table that says where the function body REALLY is for that particular class, since they can have different ones, EVEN IF we are using FooParent *
to point to instances of any of them.
But here's why we would want to do that in the first place: suppose virtual
does not exist. And you, the programmer, want to handle a bunch of objects that come from a class hierarchy. Well, you'd end up pretty much coding the same thing that the compiler injects for you by hand! In order to pass in your instances of these various classes into some function that you write to do stuff with them, you need a singularly sized type for the function call code to work. So, use pointers because pointers are always the same size on your machine (these days), no matter how differently sized the objects they point to are. Okay. So pointers it is. That's a sort of type erasure that is required to use virtual
.
Then you need a switch
statement or something to branch on the particular class it turns out to point to. But that'd be if you coded it by hand for each variation you wrote. That's silly. So quickly you'd realize you'd be better off with a table of pointers to your various versions of barMethod()
to call. Then you could always just look up that same table from every variation, instead of rewriting handcoded switch statements and such. So you'd do that. You'd implement a table in which you have pointers to different barMethod()
s for each of the classes in the hierarchy deriving from FooParent
. They'd all have the SAME SIGNATURE (parameter list, return value, etc), but DIFFERENT BODIES, for each class.
You'd assign each class an integer i.d. or something like that and use that as the offset into the table. Maybe FooChildA
and FooChildB
are two different classes that both derive from FooParent
for example, so you'd assign A to 0 and B to 1, or something like that. Then use those as offsets to jump into the table and get your pointer. That's how look up tables work in general. Once you got your pointer, you'd push all the arguments onto the stack, and then jump to that pointer. So virtual
is just a keyword that instructs the compiler to inject all this crazy high-level code into your code for you so you don't have to manually do it.
The problem is, it's RUNTIME polymorphism, when usually COMPILE time polymorphism can be used instead, via templates etc. It adds a lot of runtime bloat to every single function call in the virtual hierarchy. That's actually just fine for non-hot loops. But for things that run all the time in your system (like every few milliseconds or more) that's really an unacceptable amount of bloat. For the vast majority of cases, you could do the equivalent of all that table lookup stuff at compile time instead using metaprogramming so that runtime can be blazingly fast.
As for override
, that confusing mess should have been in the language from the get-go and should be in the same textual position as the virtual
keyword. Sadly, both of those "shoulds" were not done. So in the old days, you'd declare barMethod()
in the most parent of the class hierarchy as virtual
, and then also declare barMethod()
in the derived classes as virtual
. At some point this got to be super annoying due to weird bugs. The feature honestly isn't intuitive and is hard to teach or even remember after YEARS of knowing about it.
So we added override
as well as a hint to the compiler so we can catch bugs. It just means "not only is this function virtual, so do all that crazy vtable dispatching stuff, but in addition, this is a DERIVED re-definition of barMethod()
, so the compiler can check to make sure you matched the parameters etc perfectly with the parent class from which it was derived, because without this check, if you accidentally failed to match the derived version's parameter list exactly with the parent's version, instead of overriding the parent version, the compiler would just say, "Oh, another totally new virtual member function hierarchy is starting, with different parameters, and this is the root. Must be a new overload set."
I realize that's a super confusing statement. But basically, if you have barMethod()
and barMethod(int)
and barMethod(int, char*)
and so forth, these are all DIFFERENT functions with no real relationship to each other. It's as if each had a different name. You can think of it that way in your head. It's essentially how the compiler itself thinks of it, with name mangling. So if you then made them virtual
, you might think that declaring them in various classes in the hierarchy would put them into a single member function virtual hierarchy as well. But it doesn't. If you make them virtual using override
keyword instead, the compiler would notice that barMethod(int) override
and barMethod(int, char*) override
have no relationship to anything in FooParent
, which only has barMethod()
with no parameters. But they are supposedly overriding something. ¡COMPILER ERROR! And that's good. You want that compiler error, or else you code goes out to customers and looks like it's working but absolutely isn't.
The point of virtual
is to allow you to use a SINGLE POINTER TYPE to represent any instances of an entire hierarchy of classes, but do different things for each of them, potentially. That wouldn't happen if the programmer didn't make sure ALL of the derived redefinitions are also virtual. And override makes sure they aren't accidentally creating new class hierachy roots.
In modern C++, we have decided it was too annoying to require both virtual
and override
, and that it always made it harder to visually grep which barMethod()
s were the root version, and which ones were derived. And so they said, "you can drop the virtual
keyword for the derived redefinitions and JUST use override
." This is considered the only proper way to speak nowadays.
struct FooParent
{
// The root has virtual
virtual void barMethod(){ /* body */ } // or `=0` for "pure virtual"
}
// Original way of doing it. Just use virtual again, but this isn't the root now. This is a derived class.
struct FooChild_OldSchool : FooParent
{
virtual void barMethod(); // Total trashmouth. Bug prone.
}
struct FooChild_OverrideDays : FooParent
{
virtual void barMethod() override; // Naughty mouth. Using both.
}
struct FooChild_NonTrashyWay2020 : FooParent
{
void barMethod() override; // Prim and proper mouth. Using only override in the derived class.
}
Bizarrely though, override
sits in a different location syntactically, AFTER the parameter list, instead of before it. As far as I can tell this is really illogical. I really wish that we would fix this and allow override
to go in the same place virtual
does, at the beginning of the declaration, or better yet, let virtual
go where override
does, after the parameter list. As it is now, it's annoyingly inconsistent and confusing, imo. And I say all that because I believe these things make it unteachable if we don't admit they are warts. Because when you are learning a new language, you really need a more fluent speaker to say, "hey this is weird and warty. Don't worry about it. It's not because you're dumb. It's just because our language is evolved and wonky."
I wish it was like this...
struct FooChild_HowIWishItWas : FooParent
{
override void barMethod();
}
// OR EVEN BETTER! Allow us to change the location of virtual!
struct FooParent_HowIWishItWasEvenMore
{
void barMethod() virtual;
}
But it isn't. That's maybe how you can think of it internally though, and then just remember to add this weird wonkiness syntactically when you're actually typing the code. Wonder whether a paper on this would survive 5 minutes. Hmm.
using namespace std;
is a bad habit to get into and if you can stop now you might avoid a whole lot of headaches in the future. Thestd::
prefix is there for a reason: It avoids conflict with your own classes, structures and variables. – SeeingBase *p = new Derived; p->printMe();
with and without thevirtual
. – Perice