Alternative virtual function calls implementations?
Asked Answered
U

11

50

C++ supports dynamic binding through virtual mechanism. But as I understand the virtual mechanism is an implementation detail of the compiler and the standard just specifies the behaviors of what should happen under specific scenarios. Most compilers implement the virtual mechanism through the virtual table and virtual pointer. This is not about implementation detail of virtual pointers and table. My questions are:

  1. Are there any compilers which implement dynamic dispatch of virtual functions in any other way other than the virtual pointer and virtual table mechanism? As far as I have seen most (read G++, Microsoft Visual Studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
  2. The sizeof of any class with just a virtual function will be size of an pointer (vptr inside this) on that compiler. So given that virtual pointer and TBL mechanism itself is compiler implementation, will this statement I made above be always true?
Umbilical answered 4/12, 2010 at 5:2 Comment(6)
@Als... I was wondering about the same questions for quite a long time. Never thought to start a discussion on it. Thanks for starting it, Als.Simonson
If you're aiming to find out if it's safe to use interfaces (virtual base classes) across modules made with different compilers, remember that the vtable layout must also match, even if both compilers use vtables. (And calling conventions, and some other stuff.) COM works on the assumption that all compilers on a given platform do (or can be made to) create compatible vtables (for that platform), although I think that assumption fails (or at least is difficult to satisfy) with some compilers.Catty
Excellent question. I believe there could be compilers that instead of vptr will store the entire function table in every object. In some very specific scenarios this may be valuable, since during a virtual function call such an approach saves an extra memory indirection. In driver development this may be critical: such an indirection may lead to an access to a paged memory.Frottage
@valdo: However that would waste a lot of RAM in the general case.Coeval
Don't forget that if you compile with RTTI enabled...Tuneless
@kotlinski: sure. That's why I say this may make sense in some very specific scenariosFrottage
D
22

It is not true that vtable pointers in objects are always the most efficient. My compiler for another language used to use in-object pointers for similar reasons but no longer does: instead it uses a separate data structure which maps the object address to the required meta-data: in my system this happens to be shape information for use by the garbage collector.

This implementation costs a bit more storage for a single simple object, is more efficient for complex objects with many bases, and it is vastly more efficient for arrays, since only a single entry is required in the mapping table for all objects in the array. My particular implementation can also find the meta-data given a pointer to any point interior to the object.

The actual lookup is extremely fast, and the storage requirements very modest, because I am using the best data structure on the planet: Judy arrays.

I also know of no C++ compiler using anything other than vtable pointers, but it is not the only way. In fact, the initialisation semantics for classes with bases make any implementation messy. This is because the complete type has to see-saw around as the object is constructed. As a consequence of these semantics, complex mixin objects lead to massive sets of vtables being generated, large objects, and slow object initialisation. This probably isn't a consequence of the vtable technique as much as needing to slavishly follow the requirement that the run-time type of a subobject be correct at all times. Actually there's no good reason for this during construction, since constructors are not methods and can't sensibly use virtual dispatch: this isn't so clear to me for destruction since destructors are real methods.

Diffident answered 7/12, 2010 at 20:57 Comment(9)
"vastly more efficient for arrays" Why would you want to make an array with virtual objects in the first place? It seems really useless, considered arrays are homogenous. (in C++ at least)Coeval
@kotlinski: the virtualness might be gratuitous: you make the array for this type for the same reason as any other. It's a good point though.Diffident
On the last paragraph: The same reason exists for keeping track of the concrete type both during construction and destruction. While neither constructors nor destructors are real member functions, they can call on other member functions, including virtual member functions. Different languages approach the problem in different ways, with C++ keeping track of the type of the object at each stage, and Java considering the object to be of the most derived type at all times. Neither approach is perfect, but it was a design decision that had to be taken.Volitant
BTW: the issue with Java can be demonstrated by creating a class hierarchy with the base constructor calling a method that is implemented in the most derived type. Then in the most derived type, have that method print a member value (even a final value initialized during construction) and you will see the effect. The problem in this case is that the system dispatched to a not-yet-constructed object.Volitant
@David Are you sure that dtors aren't real member functions? IIRC when using placement new you have to call the dtor manually before free-ing, deleting, unwinding, etc the memory you placed the object in. (If you want the dtor called at all)Denice
@KitsuneYMG: They are not real, common member functions. While you can both trigger a call to the constructor with placement new, and you can manually call the destructor as if it was a member function, they have very specific behaviors where they differ from member functions. In particular when a constructor or destructor is being executed, no call to a virtual function on the object will be polymorphically dispatched (more precisely, it will not be polymorphically dispatched below the type being constructed/destructed)Volitant
Another note: you can actually call a virtual method from a constructor through dynamic dispatch, you only need to do it through a level of indirection. You can call a function that takes a reference to your base type and calls on a virtual function, the indirection function must use virtual dispatch, since it does not know what the exact type of the object is.Volitant
"slow object initialisation." slow how?Caduceus
Some actual research on this: usenix.org/legacy/event/jvm02/full_papers/zendra/zendra_html/…Cota
F
8

To my knowledge, all C++ implementations use a vtable pointer, although it would be quite easy (and perhaps not so bad perf wise as you might think given caches) to keep a small type-index in the object (1-2 B) and subsequently obtain the vtable and type information with a small table lookup.

Another interesting approach might be BIBOP (http://foldoc.org/BIBOP) -- big bag of pages -- although it would have issues for C++. Idea: put objects of the same type on a page. Get a pointer to the type descriptor / vtable at the top of the page by simply and'ing off the less signficant bits of the object pointer. (Doesn't work well for objects on the stack, of course!)

Another other approach is to encode certain type tags/indices in the object pointers themselves. For example, if by construction all objects are 16-byte aligned, you can use the 4 LSBs to put a 4-bit type tag in there. (Not really enough.) Or (particularly for embedded systems) if you have guaranteed unused more-significant-bits in addresses, you can put more tag bits up there, and recover them with a shift and mask.

While both these schemes are interesting (and sometimes used) for other language implementations, they are problematic for C++. Certain C++ semantics, such as which base class virtual function overrides are called during (base class) object construction and destruction, drive you to a model where there is some state in the object that you modify as you enter base class ctors/dtors.

You may find my old tutorial on the Microsoft C++ object model implementation interesting. http://www.openrce.org/articles/files/jangrayhood.pdf

Happy hacking!

Fancie answered 12/12, 2010 at 4:42 Comment(1)
So it's called BIBOP, and it possibly even exists in actual languages! :) The stack-allocated-object problem can be hacked around by treating the entire stack as a single huge object with a troubling number of thunk "methods" that look up this in a table somewhere and forward accordingly...Middleaged
C
6
  1. I don't think there are any modern compilers with an approach other than vptr/vtable. Indeed, it would be hard to figure out something else that is not just plain inefficient.

    However, there is still a pretty large room for design tradeoffs within that approach. Maybe especially regarding how virtual inheritance is handled. So it makes sense to make this implementation-defined.

    If you are interested in this kind of stuff, I strongly suggest reading Inside the C++ Object Model.

  2. sizeof class depends on the compiler. If you want portable code, don't make any assumptions.

Coeval answered 4/12, 2010 at 10:36 Comment(4)
I think Matthieu pointed out in a comment a few weeks/months ago that there are other implementations. But I can't remember exactly whether this was theoretical or existing compilers, whether it was C++ at all, and can't find it either. Sorry.Strontium
No, I am not saying the size will be fixed(same) across all compilers, The Q is Will it be equal to sizeof a pointer on that particular compiler? This Q since, IMU vptr inside the this takes up the ptr size but since the vptr in itself may not be present on certain compilers as virtual mechanism is implementation dependent how would this fact affect the statement?Umbilical
@Als: This is what one would expect, but it is not guaranteed.Coeval
@Als: For example... if we have max 256 classes in an application, compiler+linker could pack a typeid in a char and find the vptr in a lookup table. Then we have sizeof(T) == 1.Coeval
A
5

Are there any compilers which implement Virtual Mechanism in any other way other than the virtual pointer and virtual table mechanism? As far as i have seen most(read g++,Microsoft visual studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?

All current compilers that I know of use the vtable mechanism.

This is an optimization that's possible because C++ is statically type checked.

In some more dynamic languages there is instead a dynamic search up the base class chain(s), searching for an implementation of a member function that's called virtually, starting in the most derived class of the object. For example, that's how it worked in original Smalltalk. And the C++ standard describes the effect of a virtual call as if such a search had been used.

In Borland/Turbo Pascal in the 1990's such dynamic search was employed for finding handlers of Windows API "window messages". And I think possibly the same in Borland C++. It was in addition to the normal vtable mechanism, used solely for message handlers.

If it was used in Borland/Turbo C++ – I can't remember – then it was in support of a language extensions that allowed you to associate message id's with message handler functions.

The sizeof of any class with just a virtual function will be size of an pointer(vptr inside the this) on that compiler, So given that virtual ptr and tbl mechanism itself is compiler implementation, will this statement I made above be always true?

Formally no (even with assumption of vtable mechanism), it depends on the compiler. Since the standard doesn't require the vtable mechanism it says nothing about placement of vtable pointer in each object. And other rules let the compiler freely add padding, unused bytes, at the end.

But in practice perhaps. ;-)

However it's not something that you should rely on, or that you need to rely on. But in the other direction you can require this, for example if you're defining an ABI. Then any compiler that doesn't, simply doesn't conform to your requirement.

Cheers & hth.,

Ava answered 7/12, 2010 at 11:31 Comment(4)
"This is an optimization that's possible because C++ is statically type checked" no, it is possible because the static type system of C++ allows it, by designCaduceus
@curiousguy: I don't understand your comment, sorry.Ava
C++ typing is based on class identity (class names), not class structure (content). Static typing can use type identity or type content, so that two classes with equivalent definitions could be considered type-compatible or even to have identical types.Caduceus
Still doesn't click, I'm sorry. I think you're trying to say that it's possible to design some kinds of static typing where vtables aren't practical. Sure.Ava
L
4

Are there any compilers which implement Virtual Mechanism in any other way other than the virtual pointer and virtual table mechanism? As far as i have seen most(read g++,Microsoft visual studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?

None that I'm aware of C++ compilers using, though you might find it interesting to read about Binary Tree Dispatch. If you're interested in exploiting the expectation of virtual dispatch tables in any way, you should be aware that compilers can - where the types are known at compile time - sometimes resolve virtual function calls at compile time, so may not consult the table.

The sizeof of any class with just a virtual function will be size of an pointer(vptr inside the this) on that compiler, So given that virtual ptr and tbl mechanism itself is compiler implementation, will this statement I made above be always true?

Assuming no base classes with their own virtual members, and no virtual base classes, it's overwhelmingly likely to be true. Alternatives can be envisaged - such as whole-program analysis revealing only one member in the class heirarchy, and a switch to compile-time dispatch. If run-time dispatch is required, it's hard to imagine why any compiler would introduce further indirection. Still, the Standard deliberately doesn't stipulate these things precisely so that implementations can vary, or be varied in future.

Lushy answered 7/12, 2010 at 7:44 Comment(1)
Various Inria projects (Smalleifell, Cecil) used BTDs. They even tested it in a JVM. More recently, Nim uses that.Cota
M
4

In trying to imagine an alternative scheme, I have come up with the following, along the lines of Yttril's answer. As far as I'm aware, no compiler uses it!

Given a sufficiently large virtual address space and flexible OS memory allocation routines, it would be possible for new to allocate objects of different types in fixed, non-overlapping address ranges. Then the type of an object could be inferred quickly from its address using a right-shift operation, and the result used to index a table of vtables, thus saving 1 vtable pointer per object.

At first glance this scheme might seem to run into problems with stack-allocated objects, but this can be handled cleanly:

  1. For each stack-allocated object, the compiler adds code that adds a record to a global array of (address range, type) pairs when the object is created and removes the record when it is destroyed.
  2. The address range comprising the stack would map to a single vtable containing a large number of thunks that read the this pointer, scan the array to find the corresponding type (vptr) for the object at that address, and call the corresponding method in the vtable pointed to. (I.e. the 42nd thunk will call the 42nd method in the vtable -- if the most virtual functions used in any class is n, then at least n thunks are required.)

This scheme obviously incurs non-trivial overhead (at least O(log n) for the lookup) for virtual method calls on stack-based objects. In the absence of arrays or composition (containment within another object) of stack-based objects, a simpler and faster approach can be used in which the vptr is placed on the stack immediately before the object (note that it is not considered part of the object and does not contribute to its size as measured by sizeof). In this case thunks simply subtract sizeof (vptr) from this to find the correct vptr to use, and forward as before.

Middleaged answered 12/12, 2010 at 0:21 Comment(3)
That method would allow the stack-ness of an object to also be determined by its address.Plotkin
@Ben Voigt: Not sure what you mean... Yes it would, but "stack-ness" is already "determined" by address (unless you're doing something extremely weird in a device driver).Middleaged
Oh now I see what you're doing... but I think having call sites test for stack addresses would be cheaper than all these thunks/shims.Plotkin
N
4

IIRC Eiffel uses a different approach and all overrides of a method end up merged and compiled in the same address with a prologue where the object type is checked (so every object must have a type ID, but it's not a pointer to a VMT). This for C++ would require of course that the final function is created at link time. I don't know any C++ compiler that uses this approach, however.

Nosegay answered 12/12, 2010 at 0:37 Comment(0)
C
3
  1. I've never heard of or seen any compiler that uses any alternative implementation. The reason that vtables are so popular is because that not only is it the most efficient implementation, but it's also the easiest design and most obvious implementation.

  2. On pretty much any compiler you care to use, it's almost certainly true. However, it's not guaranteed and not always true- you can't depend on it, even though it's pretty much always the case. Your favourite compiler could also alter it's alignment, increasing it's size, for funsies, without telling you. From memory, it can also insert whatever debug information and whatever it likes.

Culp answered 4/12, 2010 at 12:5 Comment(1)
The reasoning in pt 2. makes a lot of sense, but does it mean that there is no way to be estimate what size a compiler may allocate to lets say a class or struct, Is it solely at discretion of the compiler? On second thoughts...We can counter that saying that is the very reason for existence of sizeof()Umbilical
A
3

C++/CLI deviates from both assumptions. If you define a ref class, it doesn't get compiled into machine code at all; instead, the compiler compiles it into .NET managed code. In the intermediate language, classes are a built-in feature, and the set of virtual methods is defined in the metadata, rather than a method table.

The specific strategy to implement object layout and dispatch depends on the VM. In Mono, an object containing just one virtual method has not the size of one pointer, but needs two pointers in the MonoObject struct; the second one for the synchronization of the object. As this is implementation-defined and also not really useful to know, sizeof is not supported for ref classes in C++/CLI.

Atheism answered 9/12, 2010 at 19:47 Comment(2)
How is this relevant to the original question? He did ask about C++, and not a CLR-based language...Citizenry
It demonstrates conditions under which a compiler might need to deviate from the traditional implementation strategies. Also, whether "C++" includes both "standard C++" and "C++/CLI" or not is debatable.Representative
N
1

First, there were mentioned Borland's proprietary extension to C++, Dynamic Dispatch Virtual Tables (DDVT), and you can read something about it in a file named DDISPATC.ZIP. Borland Pascal had both virtual and dynamic methods, and Delphi introduced yet another "message" syntax, similar to dynamic, but for messages. At this point I'm not sure if Borland C++ had the same features. There was no multiple inheritance in either Pascal or Delphi, so Borland C++ DDVT might be different from either Pascal or Delphi.

Second, in 1990s and a bit earlier there was experimenting with different object models, and Borland was not the most advanced one. I personally think that shutting down IBM SOMobjects did a damage to the world that we all still suffering from. Before shutting down SOM there were experiments with Direct-to-SOM C++ compilers. So instead of C++'s way of invoking methods SOM is used. It is in many ways similar to C++ vtable, with several exceptions. First, to prevent fragile base class problem, programs do not use offsets inside of vtable, because they don't know this offset. It can change if base class introduces new methods. Instead, callers invoke a thunk created in runtime that has this knowledge in its assembly code. And there is one more difference. In C++, when multiple inheritance is used, an object can contain several VMTs IIRC. In contrast to C++, each SOM object has just one VMT, so dispatch code should be different from "call dword ptr [VMT+offset]".

There is a document related to SOM, Release-to-Release Binary Compatibility in SOM. You can find comparison of SOM with another projects I know little of, like Delta/C++ and Sun OBI. They solve a subset of problems that SOM solves, and by doing so they are also having somewhat tweaked invokation code.

I have recently found Visual Age C++ v3.5 for Windows compiler fragment enough to get things running and actually touch it. Most of users are not likely to get OS/2 VM just to play with DTS C++, but having Windows compiler is completely another matter. VAC v3.5 is the first and the last version to support Direct-to-SOM C++ feature. VAC v3.6.5 and v4.0 are not appropriate.

  1. Download VAC 3.5 fixpak 9 from IBM FTP. This fixpak contain many files, so you don't even need to full compiler (I have 3.5.7 distro, but fixpak 9 was big enough to do some tests).
  2. Unpack to e. g. C:\home\OCTAGRAM\DTS
  3. Start command line and run subsequent commands there
  4. Run: set SOMBASE=C:\home\OCTAGRAM\DTS\ibmcppw
  5. Run: C:\home\OCTAGRAM\DTS\ibmcppw\bin\SOMENV.BAT
  6. Run: cd C:\home\OCTAGRAM\DTS\ibmcppw\samples\compiler\dts
  7. Run: nmake clean
  8. Run: nmake
  9. hhmain.exe and its dll are in different directories, so we must make them find each other somehow; since I was doing several experiments, I executed "set PATH=%PATH%;C:\home\OCTAGRAM\DTS\ibmcppw\samples\compiler\dts\xhmain\dtsdll" once, but you can just copy dll near to hhmain.exe
  10. Run: hhmain.exe

I've got an output this way:

Local anInfo->x = 5
Local anInfo->_get_x() = 5
Local anInfo->y = A
Local anInfo->_get_y() = B
{An instance of class info at address 0092E318

}
Nidify answered 7/12, 2014 at 15:38 Comment(0)
G
0

Tony D's answer correctly points out that compilers are allowed to use whole-program analysis to replace a virtual function call with a static call to the unique possible function implementation; or to compile obj->method() into the equivalent of

if (auto frobj = dynamic_cast<FrequentlyOccurringType>(obj)) {
    frobj->FrequentlyOccurringType::method();  // static dispatch on hot path
} else {
    obj->method();  // vtable dispatch on cold path
}

Karel Driesen and Urs Hölzle wrote a really fascinating paper way back in 1996 in which they simulated the effect of perfect whole-program optimization on typical C++ applications: "The Direct Cost of Virtual Function Calls in C++". (The PDF is available for free if you Google for it.) Unfortunately, they only benchmarked vtable dispatch versus perfect static dispatch; they didn't compare it to binary tree dispatch.

They did point out that there are actually two kinds of vtables, when you're talking about languages (like C++) that support multiple inheritance. With multiple inheritance, when you call a virtual method that is inherited from the second base class, you need to "fix up" the object pointer so it points to an instance of the second base class. This fixup offset can be stored as data in the vtable, or it can be stored as code in a "thunk". (See the paper for more details.)

I believe all decent compilers these days use thunks, but it did take 10 or 20 years for that market penetration to reach 100%.

Goose answered 25/4, 2013 at 22:22 Comment(4)
Is FrequentlyOccurringType a "final" class? (a "final" class is a class that isn't significantly derived from, significant derivation is one that overrides a function and is instantiated such that the vtable is needed)Caduceus
@curiousguy: Yes. Whole-program analysis by definition sees the "whole program," so it knows whether FrequentlyOccurringType has any subclasses that would screw us up. In that case it could back off to if (typeid(obj) == typeid(FrequentlyOccurringType)), or slap a if (auto aobj = dynamic_cast<AwkwardChildType>(obj)) aobj->AwkwardChildType::method() in front. This is the same idea as "guards" in tracing JIT, just, done at compile (link)-time instead of interpret-time.Goose
I believe if (typeid(obj) == typeid(FrequentlyOccurringType) is more efficient than auto frobj = dynamic_cast<FrequentlyOccurringType>(obj) in almost all casesCaduceus
Well, chalk that up to the state of my knowledge 5 years ago. :) Since then, I've given youtube.com/watch?v=QzJL-8WbpuU and I agree with your assessment. (In my defense, I did say "the equivalent of...", which includes the possibility that the compiler might optimize it, since, again, we're assuming it knows the entire class hierarchy. Also, if you're going to ask "how does this work with DLLs?"... it doesn't. :))Goose

© 2022 - 2024 — McMap. All rights reserved.