C++/compilation : is it possible to set the size of the vptr (global vtable + 2 bytes index)
Asked Answered
C

1

17

I posted recently a question about the memory overhead due to virtuality in C++. The answers allow me to understand how vtable and vptr works. My problem is the following : I work on supercomputers, I have billions of some objects and consequently I have to care about the memory overhead due to virtuality. After some measures, when I use classes with virtual functions, each derived object has its 8-byte vptr. This is not negligible at all.

I wonder if intel icpc or g++ have some configuration/option/parameters, to use "global" vtables and indexes with adjustable precision instead of vptr. Because a such thing would allow me to use 2-bytes index (unsigned short int) instead of 8-bytes vptr for billions of objects (and a good reduction of memory overhead). Is there any way to do that (or something like that) with compilation options ?

Thank you very much.

Cystotomy answered 12/5, 2012 at 8:50 Comment(12)
Nice question. I'm almost sure the answer is "no". Perhaps you can use templates to get rid of runtime polymorphism.Hexahedron
Do you actually need an 64-bit address space, or can you compile 32bit executables? That would cut the vtable pointer in half. Actually Sharptooth has the best answer. The alternative would be to move the data out of the objects, but that may not be practical.Torrefy
@Ben" you cannot really fit "billions of objects" in a 32-bit address space.Butyl
@n.m. :-) Of course not. Silly me.Torrefy
@Torrefy : the type of machine I use have at least 10 TBytes of memory...Cystotomy
I don't think there's such an option for either gcc or icc. If you cannot get rid of runtime polymorphism entirely, you may want to implement your own type discriminant for these objects, and not rely on the vptr/vtbl mechanism. Basically, use the old-fashioned type field in the objects, and switch according to its value.Butyl
Possible duplicate of this: https://mcmap.net/q/357911/-alternative-schemes-for-implementing-vptr/1009831Heathenism
Do you really need polymorphic classes? Maybe a redesign of the code would improve efficiency more than trying to modify the way virtual functions work.Pilliwinks
@Kerrek: no doubt it would -- if a compiler can in principle do this automatically, then the programmer could do it manually by replacing all virtual functions with entries in a table of function pointers, replacing all virtual calls with call indirected through such a table, and putting the index of the correct table at the start of each object. But if there was a gcc option to do that I'd certainly take a one-line modification to a makefile in preference to redesigning a large code base :-)Thomson
@SteveJessop: I didn't mean that. People often use virtuality far more than necessary. I was thinking about a more high-level redesign that just avoids an inheritance hierarchy. It's impossible to tell without more details, though.Pilliwinks
@Kerrek: The code I'm talking about is for computationnal astrophysics. So I use virtuality only if it is necessary and to have a good balance between performances, readability, and reusability of the code. So if I can save a TByte of RAM using a compiler option it would be great. If there is no such solution, I will implement a "low memory" version by hand if I have to...Cystotomy
Keep in mind that composition is a valid "workaround" for not inheritance in many cases (when you have a good reason, that is).Biagi
C
16

Unfortunately... not automatically.

But remember than a v-table is nothing but syntactic sugar for runtime polymorphism. If you are willing to re-engineer your code, there are several alternatives.

  1. External polymorphism
  2. Hand-made v-tables
  3. Hand-made polymorphism

1) External polymorphism

The idea is that sometimes you only need polymorphism in a transient fashion. That is, for example:

std::vector<Cat> cats;
std::vector<Dog> dogs;
std::vector<Ostrich> ostriches;

void dosomething(Animal const& a);

It seems wasteful for Cat or Dog to have a virtual pointer embedded in this situation because you know the dynamic type (they are stored by value).

External polymorphism is about having pure concrete types and pure interfaces, as well as a simple bridge in the middle to temporarily (or permanently, but it's not what you want here) adapt a concrete type to an interface.

// Interface
class Animal {
public:
    virtual ~Animal() {}

    virtual size_t age() const = 0;
    virtual size_t weight() const = 0;

    virtual void eat(Food const&) = 0;
    virtual void sleep() = 0;

private:
    Animal(Animal const&) = delete;
    Animal& operator=(Animal const&) = delete;
};

// Concrete class
class Cat {
public:
    size_t age() const;
    size_t weight() const;

    void eat(Food const&);
    void sleep(Duration);
};

The bridge is written once and for all:

template <typename T>
class AnimalT: public Animal {
public:
    AnimalT(T& r): _ref(r) {}

    virtual size_t age() const override { return _ref.age(); }
    virtual size_t weight() const { return _ref.weight(); }

    virtual void eat(Food const& f) override { _ref.eat(f); }
    virtual void sleep(Duration const d) override { _ref.sleep(d); }

private:
    T& _ref;
};

template <typename T>
AnimalT<T> iface_animal(T& r) { return AnimalT<T>(r); }

And you can use it so:

for (auto const& c: cats) { dosomething(iface_animal(c)); }

It incurs an overhead of two pointers per item, but only as long as you need polymorphism.

An alternative is to have AnimalT<T> work with values too (instead of references) and providing a clone method, which allows you to chose fully between having a v-pointer or not depending on the situation.

In this case, I advise using a simple class:

template <typename T> struct ref { ref(T& t): _ref(t); T& _ref; };

template <typename T>
T& deref(T& r) { return r; }

template <typename T>
T& deref(ref<T> const& r) { return r._ref; }

And then modify the bridge a bit:

template <typename T>
class AnimalT: public Animal {
public:
    AnimalT(T r): _r(r) {}

    std::unique_ptr< Animal<T> > clone() const { return { new Animal<T>(_r); } }

    virtual size_t age() const override { return deref(_r).age(); }
    virtual size_t weight() const { return deref(_r).weight(); }

    virtual void eat(Food const& f) override { deref(_r).eat(f); }
    virtual void sleep(Duration const d) override { deref(_r).sleep(d); }

private:
    T _r;
};

template <typename T>
AnimalT<T> iface_animal(T r) { return AnimalT<T>(r); }

template <typename T>
AnimalT<ref<T>> iface_animal_ref(T& r) { return Animal<ref<T>>(r); }

This way you choose when you wanted polymorphic storage and when you do not.


2) Hand-made v-tables

(only easily works on closed hierachies)

It is common in C to emulate object orientation by providing one's own v-table mechanism. Since you appear to know what a v-table is and how the v-pointer works, then you can perfectly work it yourself.

struct FooVTable {
    typedef void (Foo::*DoFunc)(int, int);

    DoFunc _do;
};

And then provide a global array for the hierarchy anchored in Foo:

extern FooVTable const* const FooVTableFoo;
extern FooVTable const* const FooVTableBar;

FooVTable const* const FooVTables[] = { FooVTableFoo, FooVTableBar };

enum class FooVTableIndex: unsigned short {
    Foo,
    Bar
};

Then all you need in your Foo class is to hold onto the most derived type:

class Foo {
public:

    void dofunc(int i, int j) {
        (this->*(table()->_do))(i, j);
    }

protected:
    FooVTable const* table() const { return FooVTables[_vindex]; }

private:
    FooVTableIndex _vindex;
};

The closed hierarchy is there because of the FooVTables array and the FooVTableIndex enumeration which need be aware of all the types of the hierarchy.

The enum index can be bypassed though, and by making the array non constant it is possible to pre-initialize to a larger size and then at init having each derived type registering itself there automatically. Conflicts of indexes are thus detected during this init phase, and it is even possible to have automatic resolution (scanning the array for a free slot).

This may be less convenient, but does provide a way to open the hierarchy. Obviously it's easier to code before any thread is launched, as we are talking global variables here.


3) Hand-made polymorphism

(only really works for closed hierarchies)

The latter is based on my experience exploring the LLVM/Clang codebase. A compiler has the very same problem that you are faced with: for tens or hundreds of thousands of small items a vpointer per item really increases memory consumption, which is annoying.

Therefore, they took a simple approach:

  • each class hierarchy has a companion enum listing all members
  • each class in the hierarchy passes its companion enumerator to its base upon construction
  • virtuality is achieved by switching over the enum and casting appropriately

In code:

enum class FooType { Foo, Bar, Bor };

class Foo {
public:
    int dodispatcher() {
        switch(_type) {
        case FooType::Foo:
            return static_cast<Foo&>(*this).dosomething();

        case FooType::Bar:
            return static_cast<Bar&>(*this).dosomething();

        case FooType::Bor:
            return static_cast<Bor&>(*this).dosomething();
        }
        assert(0 && "Should never get there");
    }
private:
    FooType _type;
};

The switches are pretty annoying, but they can be more or less automated playing with some macros and type list. LLVM typically use a file like:

 // FooList.inc
 ACT_ON(Foo)
 ACT_ON(Bar)
 ACT_ON(Bor)

and then you do:

 void Foo::dodispatcher() {
     switch(_type) {
 #   define ACT_ON(X) case FooType::X: return static_cast<X&>(*this).dosomething();

 #   include "FooList.inc"

 #   undef ACT_ON
     }

     assert(0 && "Should never get there");
 }

Chris Lattner commented that due to how switches are generated (using a table of code offsets) this produced code similar to that of a virtual dispatch, and thus had about the same amount of CPU overhead, but for a lower memory overhead.

Obviously, the one drawback is that Foo.cpp need to include all of the headers of its derived classes. Which effectively seals the hierarchy.


I voluntarily presented the solutions from the most open one to the most closed one. They have various degrees of complexity/flexibility, and it is up to you to choose which one suits you best.

One important thing, in the latter two cases destruction and copies require special care.

Conceit answered 12/5, 2012 at 10:15 Comment(6)
For (1), you might like to check out std::ref.Pilliwinks
@KerrekSB: thanks, but std::reference_wrapper is quite a mouthful :xConceit
How does #3 save space? Wouldn't the private: FooType _type; take up about as much space as a vptr?Distal
@Yay295: in C++03, the size of the enum is left to the compiler (it's only guaranteed to be as big as necessary to represent all enumerators). It is very unlikely that it'll be 64 bits wide and therefore is likely a win on 64-bits architecture and depends on the compiler quality/guarantee on 32-bits architecture. In C++11, you can decide the size of the backing storage using enum class: the above should use uint8_t as backing storage and therefore be 8 bits, a clear win over 32 bits or 64 bits.Conceit
Struct/Class padding could negate those savings though.Distal
@Yay295: Oh certainly, however when you start worrying about shaving off 7 bytes from your class, you pay attention to padding :)Conceit

© 2022 - 2024 — McMap. All rights reserved.