What type erasure techniques are there, and how do they work?
Asked Answered
A

6

143

(With type erasure, I mean hiding some or all of the type information regarding a class, somewhat like Boost.Any.)
I want to get a hold of type erasure techniques, while also sharing those, which I know of. My hope is kinda to find some crazy technique that somebody thought of in his/her darkest hour. :)

The first and most obvious, and commonly taken approach, that I know, are virtual functions. Just hide the implementation of your class inside an interface based class hierarchy. Many Boost libraries do this, for example Boost.Any does this to hide your type and Boost.Shared_ptr does this to hide the (de)allocation mechanic.

Then there is the option with function pointers to templated functions, while holding the actual object in a void* pointer, like Boost.Function does to hide the real type of the functor. Example implementations can be found at the end of the question.

So, for my actual question:
What other type erasure techniques do you know of? Please provide them, if possible, with an example code, use cases, your experience with them and maybe links for further reading.

Edit
(Since I wasn't sure wether to add this as an answer, or just edit the question, I'll just do the safer one.)
Another nice technique to hide the actual type of something without virtual functions or void* fiddling, is the one GMan employs here, with relevance to my question on how exactly this works.


Example code:

#include <iostream>
#include <string>

// NOTE: The class name indicates the underlying type erasure technique

// this behaves like the Boost.Any type w.r.t. implementation details
class Any_Virtual{
        struct holder_base{
                virtual ~holder_base(){}
                virtual holder_base* clone() const = 0;
        };

        template<class T>
        struct holder : holder_base{
                holder()
                        : held_()
                {}

                holder(T const& t)
                        : held_(t)
                {}

                virtual ~holder(){
                }

                virtual holder_base* clone() const {
                        return new holder<T>(*this);
                }

                T held_;
        };

public:
        Any_Virtual()
                : storage_(0)
        {}

        Any_Virtual(Any_Virtual const& other)
                : storage_(other.storage_->clone())
        {}

        template<class T>
        Any_Virtual(T const& t)
                : storage_(new holder<T>(t))
        {}

        ~Any_Virtual(){
                Clear();
        }

        Any_Virtual& operator=(Any_Virtual const& other){
                Clear();
                storage_ = other.storage_->clone();
                return *this;
        }

        template<class T>
        Any_Virtual& operator=(T const& t){
                Clear();
                storage_ = new holder<T>(t);
                return *this;
        }

        void Clear(){
                if(storage_)
                        delete storage_;
        }

        template<class T>
        T& As(){
                return static_cast<holder<T>*>(storage_)->held_;
        }

private:
        holder_base* storage_;
};

// the following demonstrates the use of void pointers 
// and function pointers to templated operate functions
// to safely hide the type

enum Operation{
        CopyTag,
        DeleteTag
};

template<class T>
void Operate(void*const& in, void*& out, Operation op){
        switch(op){
        case CopyTag:
                out = new T(*static_cast<T*>(in));
                return;
        case DeleteTag:
                delete static_cast<T*>(out);
        }
}

class Any_VoidPtr{
public:
        Any_VoidPtr()
                : object_(0)
                , operate_(0)
        {}

        Any_VoidPtr(Any_VoidPtr const& other)
                : object_(0)
                , operate_(other.operate_)
        {
                if(other.object_)
                        operate_(other.object_, object_, CopyTag);
        }

        template<class T>
        Any_VoidPtr(T const& t)
                : object_(new T(t))
                , operate_(&Operate<T>)
        {}

        ~Any_VoidPtr(){
                Clear();
        }

        Any_VoidPtr& operator=(Any_VoidPtr const& other){
                Clear();
                operate_ = other.operate_;
                operate_(other.object_, object_, CopyTag);
                return *this;
        }

        template<class T>
        Any_VoidPtr& operator=(T const& t){
                Clear();
                object_ = new T(t);
                operate_ = &Operate<T>;
                return *this;
        }

        void Clear(){
                if(object_)
                        operate_(0,object_,DeleteTag);
                object_ = 0;
        }

        template<class T>
        T& As(){
                return *static_cast<T*>(object_);
        }

private:
        typedef void (*OperateFunc)(void*const&,void*&,Operation);

        void* object_;
        OperateFunc operate_;
};

int main(){
        Any_Virtual a = 6;
        std::cout << a.As<int>() << std::endl;

        a = std::string("oh hi!");
        std::cout << a.As<std::string>() << std::endl;

        Any_Virtual av2 = a;

        Any_VoidPtr a2 = 42;
        std::cout << a2.As<int>() << std::endl;

        Any_VoidPtr a3 = a.As<std::string>();
        a2 = a3;
        a2.As<std::string>() += " - again!";
        std::cout << "a2: " << a2.As<std::string>() << std::endl;
        std::cout << "a3: " << a3.As<std::string>() << std::endl;

        a3 = a;
        a3.As<Any_Virtual>().As<std::string>() += " - and yet again!!";
        std::cout << "a: " << a.As<std::string>() << std::endl;
        std::cout << "a3->a: " << a3.As<Any_Virtual>().As<std::string>() << std::endl;

        std::cin.get();
}
Andromede answered 27/3, 2011 at 15:19 Comment(20)
By "type erasure", are you really referring to "polymorphism"? I think "type erasure" has a somewhat specific meaning, which is usually associated with e.g. Java generics.Dominiquedominium
@Oli: Type erasure can be implemented with polymorphism, but that's not the only option, my second example shows that. :) And with type erasure I just mean, that your struct isn't dependend on a template type for example. Boost.Function doesn't care if you feed it a functor, a function pointer, or even a lambda. Same with Boost.Shared_Ptr. You can specifiy an allocator and deallocation function, but the actual type of the shared_ptr doesn't reflect this, it will always be the same, shared_ptr<int> for example, unlike standard container.Andromede
@Xeo: I mean "polymorphism" in the general sense, not necessarily the C++ language implementation of it. I consider operating through void * a form of polymorphism, for instance.Dominiquedominium
@Xeo: I much prefer the first technic, because of the type safety it provides (using RTTI).Cindy
@Matthieu: I consider the second example also type safe. You always know the exact type you're operating on. Or am I missing something?Andromede
@Xeo: the As embeds a static_cast, but you are not sure, at all, if you actually store a T.Cindy
@Matthieu: You're right. Normally such an As(s) function wouldn't be implemented that way. Like I said, by no means safe-to-use! :)Andromede
Added a bounty in hope of somehow finding some neat techniques. :<Andromede
I am composing an answer with a technique of my own devising, but I might not get it done before this bounty expires.Bloke
@John Dibling: Thanks, hope you make it. ;) Also, I think the bounty will expire tomorrow evening.Andromede
nice, we now have a new buzz word to indulge when writing lousy code that works around static type checking. People!! static type checking is a tool for making you not write stupid errors, not something going out of his way to make you write more codeLaura
@lurscher: Well... never used the boost or std versions of any of the following? function, shared_ptr, any, etc. ? They all employ type erasure for sweet sweet user convenience.Andromede
@John: Rejoice, the bounty expires only 23 hours from now. ;) Hope you make it.Andromede
@lurscher: there are valid reasons why using a variant class actually produces better code.Bloke
@John: I hope you're still composing your answer even though the bounty expired. Maybe I'll make another one if I like your answer. :)Andromede
@John: I know you're polishing your example, but maybe you could post it as an answer already within the next 20 hours and finish polishing it afterwards? :)Andromede
You may be interested in the type_erasure library, developed by Steven Watanabe, which should be accepted in Boost in a close future. I did not delve into the implementation, but the result is pretty awesome! AFAIK, the library uses a void * for data, and a statically constructed vtable containing function pointers for behavior.Amersfoort
Side note: the pattern used i.a. by Boost.Any is called the external polymorphism patternSight
@Xeo: the Ideone link is dead, could you please repost the code there (or here)?Manvel
@GyorgySzekely: Done. Had to recover the code through the wayback machine. Fun fun.Andromede
M
108

All type erasure techniques in C++ are done with function pointers (for behaviour) and void* (for data). The "different" methods simply differ in the way they add semantic sugar. Virtual functions, e.g., are just semantic sugar for

struct Class {
    struct vtable {
        void (*dtor)(Class*);
        void (*func)(Class*,double);
    } * vtbl
};

iow: function pointers.

That said, there's one technique I particularly like, though: It's shared_ptr<void>, simply because it blows the minds off of people who don't know you can do this: You can store any data in a shared_ptr<void>, and still have the correct destructor called at the end, because the shared_ptr constructor is a function template, and will use the type of the actual object passed for creating the deleter by default:

{
    const shared_ptr<void> sp( new A );
} // calls A::~A() here

Of course, this is just the usual void*/function-pointer type erasure, but very conveniently packaged.

Mattson answered 18/5, 2011 at 12:33 Comment(7)
Coincidentally, I had to explain the behaviour of shared_ptr<void> to a friend of mine with an example implementation just some days ago. :) It really is cool.Andromede
Good answer; to make it amazing, a sketch of how a fake-vtable can be staticly created for each erased type is very educational. Note that fake-vtables and function-pointer implementations give you known memory-sized structures (compared to pure-virtual types) that can be easily stored locally, and (easily) divorced from the data they are virtualizing.Roxieroxine
so, if the shared_ptr then stores a Derived *, but the Base * did not declare the destructor as virtual, shared_ptr<void> still works as intended, since it never even knew about a base class to begin with. Cool!Fourposter
@Apollys: It does, but unique_ptr doesn't type-erase the deleter, so if you want to assign a unique_ptr<T> to a unique_ptr<void>, you need to provide a deleter argument, explicitly, that knows how to delete the T through a void*. If you now want to assign an S, too, then you need a deleter, explicitly, that knows how to delete a T through a void* and also an S through a void*, and, given a void*, knows whether it's a T or an S. At that point, you have written a type-erased deleter for unique_ptr, and then it also works for unique_ptr. Just not out of the box.Mattson
I feel like the question you answered was "How do I workaround the fact that this doesn't work with unique_ptr?" Useful for some people, but didn't address my question. I guess the answer is, because shared pointers got more attention in the development of the standard library. Which I think is a little sad because unique pointers are simpler, so it should be easier to implement basic functionalities, and they're more efficient so people should use them more. Instead we have the exact opposite.Become
@Apollys unique_ptr is more efficient than shared_ptr precisely because it doesn't do type erasure out of the box. The payload and the deleter need to be the exact types. If you want type-erasure with a unique_ptr, you need to implement it yourself.Mattson
I highly doubt that's true (your first sentence). The primary reason unique_ptr is more efficient than` shared_ptr` is because it doesn't have the overhead of managing a shared resource.Become
H
55

Fundamentally, those are your options: virtual functions or function pointers.

How you store the data and associate it with the functions can vary. For example, you could store a pointer-to-base, and have the derived class contain the data and the virtual function implementations, or you could store the data elsewhere (e.g. in a separately allocated buffer), and just have the derived class provide the virtual function implementations, which take a void* that points to the data. If you store the data in a separate buffer, then you could use function pointers rather than virtual functions.

Storing a pointer-to-base works well in this context, even if the data is stored separately, if there are multiple operations that you wish to apply to your type-erased data. Otherwise you end up with multiple function pointers (one for each of the type-erased functions), or functions with a parameter that specifies the operation to perform.

Humorous answered 17/5, 2011 at 16:26 Comment(2)
So, in other words the examples I gave in the question? Though, thanks for writing it up like this, especially w.r.t. to the virtual functions and multiple operations on the type-erased data.Andromede
There are at least 2 other options. I am composing an answer.Bloke
C
26

I would also consider (similar to void*) the use of "raw storage": char buffer[N].

In C++0x you have std::aligned_storage<Size,Align>::type for this.

You can store anything you want in there, as long as it's small enough and you deal with the alignment properly.

Cindy answered 27/3, 2011 at 16:19 Comment(3)
Well yeah, Boost.Function actually uses a combination of this and the second example I gave. If the functor is small enough, it stores it internally inside the functor_buffer. Good to know about std::aligned_storage though, thanks! :)Andromede
You can also use placement new for this.Ranger
@RustyX: Actually, you have to. std::aligned_storage<...>::type is just a raw buffer which, unlike char [sizeof(T)], is suitably aligned. By itself, though, it's inert: it does not initialize its memory, does not build an object, nothing. Therefore, once you have a buffer of this type, you have to manually construct objects inside it (with either placement new or an allocator construct method) and you have to manually destruct the objects inside it too (either manually invoking their destructor or using an allocator destroy method).Cindy
M
24

Stroustrup, in The C++ programming language (4th edition) §25.3, states:

Variants of the technique of using a single runt-time representation for values of a number of types and relying on the (static) type system to ensure that they are used only according to their declared type has been called type erasure.

In particular, no use of virtual functions or function pointers is needed to perform type erasure if we use templates. The case, already mentioned in other answers, of the correct destructor call according to the type stored in a std::shared_ptr<void> is an example of that.

The example provided in Stroustrup's book is just as enjoyable.

Think about implementing template<class T> class Vector, a container along the lines of std::vector. When you will use your Vector with a lot of different pointers types, as it often happens, the compiler will supposedly generate different code for every pointer type.

This code bloat can be prevented by defining a specialization of Vector for void* pointers and then using this specialization as a common base implementation of Vector<T*> for all others types T:

template<typename T>
class Vector<T*> : private Vector<void*>{
// all the dirty work is done once in the base class only 
public:
    // ...
    // static type system ensures that a reference of right type is returned
    T*& operator[](size_t i) { return reinterpret_cast<T*&>(Vector<void*>::operator[](i)); }
};

As you can see, we have a strongly typed container but Vector<Animal*>, Vector<Dog*>, Vector<Cat*>, ..., will share the same (C++ and binary) code for the implementation, having their pointer type erased behind void*.

Monopteros answered 24/2, 2015 at 12:21 Comment(7)
Without meaning to be blasphemous: I would prefer CRTP to the technique given by Stroustrup.Clerc
@Clerc What do you mean?Monopteros
One can obtain the same behaviour (with a less akward syntax) by using a CRTP base class template<typename Derived> VectorBase<Derived> which is then specialized as template<typename T> VectorBase<Vector<T*> >. Moreover, this approach does not work only for pointers, but for any type.Clerc
Note that good C++ linkers merge identical methods and functions: the gold linker, or MSVC comdat folding. Code is generated, but then discarded during linking.Roxieroxine
@Clerc I am trying to understand your comment and wonder if you can give me a link or a name of a pattern for which to search (not the CRTP, but the name of a technique that allows type erasure without virtual functions or function pointers). Respectfully, - ChrisStrabismus
(in the templated way you suggest, I mean)Strabismus
@Yakk: they are not allowed to. Different functions must have different addresses in C++, so the only thing they are legally allowed to do is to share the bulk of the code and have a small trampoline function/header for each original function to keep them on separate addresses. But yes, some compilers/linkers do this. And it breaks user code: e.g. Qt 5's new-style connect.Mattson
P
20

See this series of posts for a (fairly short) list of type erasure techniques and the discussion about the trade-offs: Part I, Part II, Part III, Part IV.

The one I haven't seen mentioned yet is Adobe.Poly, and Boost.Variant, which can be considered a type erasure to some extent.

Patrizius answered 13/1, 2014 at 18:18 Comment(0)
S
8

As stated by Marc, one can use cast std::shared_ptr<void>. For example store the type in a function pointer, cast it and store in a functor of only one type:

#include <iostream>
#include <memory>
#include <functional>

using voidFun = void(*)(std::shared_ptr<void>);

template<typename T>
void fun(std::shared_ptr<T> t)
{
    std::cout << *t << std::endl;
}

int main()
{
    std::function<void(std::shared_ptr<void>)> call;

    call = reinterpret_cast<voidFun>(fun<std::string>);
    call(std::make_shared<std::string>("Hi there!"));

    call = reinterpret_cast<voidFun>(fun<int>);
    call(std::make_shared<int>(33));

    call = reinterpret_cast<voidFun>(fun<char>);
    call(std::make_shared<int>(33));


    // Output:,
    // Hi there!
    // 33
    // !
}
Saccharify answered 3/11, 2014 at 9:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.