Alternative schemes for implementing vptr?

Asked 12/4, 2012 at 14:2 Answered 12/4, 2012 at 14:54

c++compiler-construction compiler-optimization micro-optimization vptr

This question is not about the C++ language itself(ie not about the Standard) but about how to call a compiler to implement alternative schemes for virtual function.

The general scheme for implementing virtual functions is using a pointer to a table of pointers.

class Base {
     private:
        int m;
     public:
        virtual metha();
};

equivalently in say C would be something like

struct Base {
    void (**vtable)();
    int m;
}

the first member is usually a pointer to a list of virtual functions, etc. (a piece of area in the memory which the application has no control of). And in most case this happens to cost the size of a pointer before considering the members, etc. So in a 32bit addressing scheme around 4 bytes, etc. If you created a list of 40k polymorphic objects in your applications, this is around 40k x 4 bytes = 160k bytes before any member variables, etc. I also know this happens to be the fastest and common implementation among C++ compiles.

I know this is complicated by multiple inheritance (especially with virtual classes in them, ie diamond struct, etc).

An alternative way to do the same is to have the first variable as a index id to a table of vptrs(equivalently in C as below)

struct Base {
    char    classid;     // the classid here is an index into an array of vtables
    int     m;
}

If the total number of classes in an application is less than 255(including all possible template instantiations, etc), then a char is good enough to hold an index thereby reducing the size of all polymorphic classes in the application(I am excluding alignment issues, etc).

My questions is, is there any switch in GNU C++, LLVM, or any other compiler to do this?? or reduce the size of polymorphic objects?

Edit: I understand about the alignment issues pointed out. Also a further point, if this was on a 64bit system(assuming 64bit vptr) with each polymorphic object members costing around 8 bytes, then the cost of vptr is 50% of the memory. This mostly relates to small polymorphics created in mass, so I am wondering if this scheme is possible for at least specific virtual objects if not the whole application.

Clothespress answered 12/4, 2012 at 14:2 Comment(6)

The member m would probably still be aligned on a DWORD boundary, so you would gain nothing. – Celisse 12/4, 2012 at 14:7

The issue with that scheme would probably be one of alignment. Here the struct Base in your second example wouldn't (generally) take up 5 bytes, saving you 3 bytes per object. The m would probably be aligned on a 4 byte boundary anyhow, so you'd have 3 bytes of wasted space – Piceous 12/4, 2012 at 14:8

I think that this method of virtual table implementation is almost universal. I've certainly never come across another method, or any options on GCC or Visual C++ that would control how it were implemented. – Piceous 12/4, 2012 at 14:11

+1 for an interesting question – Piceous 12/4, 2012 at 14:12

I already know about the alignment and agree. But my question is if this scheme exists and how to do it. Because if my base class has 3 chars extra, then I expect it to pack it into 8 bytes. – Clothespress 12/4, 2012 at 14:14

I'm not aware of any compiler that does anything like this. If you're running on an embedded system where you have to squeeze out every byte you'll be better off writing the dispatch yourself so you have exact control. In any other situation just let the compiler do what it wants. On modern machines with multiple GB of memory an extra 160K is nothing. – Knew 12/4, 2012 at 15:15

You're suggestion is interesting, but it won't work if the executable is made of several modules, passing objects among them. Given they are compiled separately (say DLLs), if one module creates an object and passes it to another, and the other invokes a virtual method - how would it know which table the classid refers to? You won't be able to add another moduleid because the two modules might not know about each other when they are compiled. So unless you use pointers, I think it's a dead end...

Recognize answered 12/4, 2012 at 14:12 Comment(3)

Agreed but I think this would partly be down to the toolchain, environment(dynamic, etc), OS. – Clothespress 12/4, 2012 at 14:18

@ManiP, the toolchain might not know how other modules were built and what they contain (this could also be a problem with regular vptrs, which aren't always placed at the same location). Bear in mind that the pointer to the vptrs table must be known at compile time. To determine the right table at runtime you'll have to know the origin of the object (which module created it), and that's a serious overhead (you can't store that data in the object at compile time). So even if you somehow eliminate (part of) that vptr, you'll end up running more code. – Recognize 12/4, 2012 at 14:52

Good point! Just like in Linux you cannot make a cross file-system hard link file. The same problem goes with inode number and partitions. – Vie 7/10, 2013 at 4:44

A couple of observations:

Yes, a smaller value could be used to represent the class, but some processors require data to be aligned so that saving in space may be lost by the requirement to align data values to e.g. 4 byte boundaries. Further, the class-id must be in a well defined place for all members of a polymorphic inheritance tree, so it is likely to be ahead of other date, so alignment problems can't be avoided.
The cost of storing the pointer has been moved to the code, where every use of a polymorphic function requires code to translate the class-id to either a vtable pointer, or some equivalent data structure. So it isn't for free. Clearly the cost trade-off depends on the volume of code vs numer of objects.
If objects are allocated from the heap, there is usually space wasted in orer to ensure objects are alogned to the worst boundary, so even if there is a small amount of code, and a large number of polymorphic objects, the memory management overhead migh be significantly bigger than the difference between a pointer and a char.
In order to allow programs to be independently compiled, the number of classes in the whole program, and hence the size of the class-id must be known at compile time, otherwise code can't be compiled to access it. This would be a significant overhead. It is simpler to fix it for the worst case, and simplify compilation and linking.

Please don't let me stop you trying, but there are quite a lot more issues to resolve using any technique which may use a variable size id to derive the function address.

I would strongly encourage you to look at Ian Piumarta's Cola also at Wikipedia Cola

It actually takes a different approach, and uses the pointer in a much more flexible way, to to build inheritance, or prototype-based, or any other mechanism the developer requires.

Judicial answered 12/4, 2012 at 14:20 Comment(5)

Regarding point 2: The cost of a virtual call is similar to that of a call through a jump table (of function pointers). Regarding point 3: useful, but carefully packed objects avoid this issue. Regarding point 4: at worst, 32 bits should be sufficient. On 64-bits platform it is a possible 4 bytes economy, per object. Thanks for the link to COLA :) – Judi 12/4, 2012 at 15:0

About point 2: agreed it's not free and there is an extra cost. It's usually something like the 'classid x 4 + [base ptr to vtables]'. Modern processors should be relatively fast implementing this I think. – Clothespress 12/4, 2012 at 15:52

@Clothespress - Yes. The cost is an extra instructions or two, potentially at every place in the code that calls a method (clearly some optimisation is possible). – Judicial 12/4, 2012 at 16:19

@ManiP: An extra level of indirection (looking up the vtable pointer in a table of vtables) would cost an extra ~5 cycles of load-use latency (assuming L1d cache hit because this table will stay hot). Branch prediction can hide this, but if branch prediction doesn't work perfectly then mispredicts take longer to detect. What could work on 64-bit architectures is putting all the vtables near each other (in the same 4GiB), and storing a 32-bit offset instead of a 64-bit pointer as the first member. – Ineffable 28/6, 2018 at 4:8

On x86, a call [vtable_section_base + rax] is 4 bytes longer than call [rax], but have no other throughput costs (see How do objects work in x86 at the assembly level? for normal C++ virtual dispatch on x86-64). Only works when the vtable_section_base is in the low 2GiB of virtual address space, though, so not in PIC code (libraries) or PIE executables (ASLR). Maybe you could work out a way for libraries to cooperate on allocating space in such a memory area. – Ineffable 28/6, 2018 at 4:11

No, there is no such switch.

The LLVM/Clang codebase avoids virtual tables in classes that are allocated by the tens of thousands: this work well in a closed hierachy, because a single enum can enumerate all possible classes and then each class is linked to a value of the enum. The closed is obviously because of the enum.

Then, virtuality is implemented by a switch on the enum, and appropriate casting before calling the method. Once again, closed. The switch has to be modified for each new class.

A first alternative: external vpointer.

If you find yourself in a situation where the vpointer tax is paid way too often, that is most of the objects are of known type. Then you can externalize it.

class Interface {
public:
  virtual ~Interface() {}

  virtual Interface* clone() const = 0; // might be worth it

  virtual void updateCount(int) = 0;

protected:
  Interface(Interface const&) {}
  Interface& operator=(Interface const&) { return *this; }
};

template <typename T>
class InterfaceBridge: public Interface {
public:
  InterfaceBridge(T& t): t(t) {}

  virtual InterfaceBridge* clone() const { return new InterfaceBridge(*this); }

  virtual void updateCount(int i) { t.updateCount(i); }

private:
  T& t; // value or reference ? Choose...
};

template <typename T>
InterfaceBridge<T> interface(T& t) { return InterfaceBridge<T>(t); }

Then, imagining a simple class:

class Counter {
public:
  int getCount() const { return c; }
  void updateCount(int i) { c = i; }
private:
  int c;
};

You can store the objects in an array:

static Counter array[5];

assert(sizeof(array) == sizeof(int)*5); // no v-pointer

And still use them with polymorphic functions:

void five(Interface& i) { i.updateCount(5); }

InterfaceBridge<Counter> ib(array[3]); // create *one* v-pointer
five(ib);

assert(array[3].getCount() == 5);

The value vs reference is actually a design tension. In general, if you need to clone you need to store by value, and you need to clone when you store by base class (boost::ptr_vector for example). It is possible to actually provide both interfaces (and bridges):

Interface <--- ClonableInterface
  |                 |
InterfaceB     ClonableInterfaceB

It's just extra typing.

Another solution, much more involved.

A switch is implementable by a jump table. Such a table could perfectly be created at runtime, in a std::vector for example:

class Base {
public:
  ~Base() { VTables()[vpointer].dispose(*this); }

  void updateCount(int i) {
    VTables()[vpointer].updateCount(*this, i);
  }

protected:
  struct VTable {
    typedef void (*Dispose)(Base&);
    typedef void (*UpdateCount)(Base&, int);

    Dispose dispose;
    UpdateCount updateCount;
  };

  static void NoDispose(Base&) {}

  static unsigned RegisterTable(VTable t) {
    std::vector<VTable>& v = VTables();
    v.push_back(t);
    return v.size() - 1;
  }

  explicit Base(unsigned id): vpointer(id) {
    assert(id < VTables.size());
  }

private:
  // Implement in .cpp or pay the cost of weak symbols.
  static std::vector<VTable> VTables() { static std::vector<VTable> VT; return VT; }

  unsigned vpointer;
};

And then, a Derived class:

class Derived: public Base {
public:
  Derived(): Base(GetID()) {}

private:
  static void UpdateCount(Base& b, int i) {
    static_cast<Derived&>(b).count = i;
  }

  static unsigned GetID() {
    static unsigned ID = RegisterTable(VTable({&NoDispose, &UpdateCount}));
    return ID;
  }

  unsigned count;
};

Well, now you'll realize how great it is that the compiler does it for you, even at the cost of some overhead.

Oh, and because of alignment, as soon as a Derived class introduces a pointer, there is a risk that 4 bytes of padding are used between Base and the next attribute. You can use them by careful selecting the first few attributes in Derived to avoid padding...

Judi answered 12/4, 2012 at 14:54 Comment(2)

Not sure what you mean by this statement : "The LLVM/Clang database avoids virtual tables in classes that are allocated by the tens of thousands". How does the compiler know which type of classes are allocated by the tens of thousands in the application? Or am I mis-interpreting this? – Clothespress 12/4, 2012 at 15:24

@ManiP: I meant codebase obviously. Anyway the compiler does not, the developers take care of not introducing virtual methods in those hierarchies and resort to manual switching instead. – Judi 12/4, 2012 at 15:57

The short answer is that no, I don't know of any switch to do this with any common C++ compiler.

The longer answer is that to do this, you'd just about have to build most of the intelligence into the linker, so it could coordinate distributing the IDs across all the object files getting linked together.

I'd also point out that it wouldn't generally do a whole lot of good. At least in a typical case, you want each element in a struct/class at a "natural" boundary, meaning its starting address is a multiple of its size. Using your example of a class containing a single int, the compiler would allocate one byte for the vtable index, followed immediately by three byes of padding so the next int would land at an address that was a multiple of four. The end result would be that objects of the class would occupy precisely the same amount of storage as if we used a pointer.

I'd add that this is not a far-fetched exception either. For years, standard advice to minimize padding inserted into structs/classes has been to put the items expected to be largest at the beginning, and progress toward the smallest. That means in most code, you'd end up with those same three bytes of padding before the first explicitly defined member of the struct.

To get any good from this, you'd have to be aware of it, and have a struct with (for example) three bytes of data you could move where you wanted. Then you'd move those to be the first items explicitly defined in the struct. Unfortunately, that would also mean that if you turned this switch off so you have a vtable pointer, you'd end up with the compiler inserting padding that might otherwise be unnecessary.

To summarize: it's not implemented, and if it was wouldn't usually accomplish much.

Johns answered 12/4, 2012 at 14:51 Comment(3)

The end result would be that objects of the class would occupy precisely the same amount of storage as if we used a pointer. only for 32-bits builds, for 64-bits builds a pointer is 8 bytes, usually twice the size of an integer. But of course any other pointer in the class would lead to the same issue. – Judi 12/4, 2012 at 14:56

I understand and agree about the alignment issue. Yes most compilers will pad those extra 3 chars to fill the 32bits. In the 64bit cases the overhead seems double. But my question is if compilers/toolchains support it, because in the case of a embedded systems, where you have polymorphic objects that are small but many in number the space becomes a luxury. – Clothespress 12/4, 2012 at 15:19

@ManiP: As I said, the short answer is that I've never seen (or heard of) a toolchain that did this. I agree it's possible, and under some circumstances might even be useful, but if it's been done, I (for one) and unaware of it. – Johns 12/4, 2012 at 15:39

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags