What is the performance overhead of std::function?

Asked 20/2, 2011 at 13:45 Answered 20/9, 2021 at 9:3

I heard on a forum using std::function<> causes performance drop. Is it true? If true, is it a big performance drop?

Tabatha answered 20/2, 2011 at 13:45 Comment(7)

Causes a performance drop compared to what alternative? – Aroma 20/2, 2011 at 13:54

You will have to be a lot more specific than that, user408141. – Rosenkranz 20/2, 2011 at 13:54

Really, this is such a poor question. – Deutsch 20/2, 2011 at 14:3

I edited the title to be a bit more meaningful. As to "compared to what" - persumable compared to a handrolled less generic solution... – Vast 20/2, 2011 at 14:13

Oh sorry, i am soooooo clumsy! :D – Tabatha 20/2, 2011 at 14:25

I found that the call is not that expensive but if you store many functions the memory overhead is looking a bit scary. If someone could please comment on this. – Difficult 2/7, 2015 at 18:53

I don't find this a poor question. Because I would be calling a normal free function and wondering what is the difference is against std::function. Since you can stuff std::function with a free function, member function, a functor, results of std::bind, lamdas, I would compare against every one of them in terms of performance time and also in the generated code – Incarcerate 29/4, 2022 at 16:10

You can find information from the boost's reference materials: How much overhead does a call through boost::function incur? and Performance

This doesn't determine "yes or no" to boost function. The performance drop may be well acceptable given program's requirements. More often than not, parts of a program are not performance-critical. And even then it may be acceptable. This is only something you can determine.

As to the standard library version, the standard only defines an interface. It is entirely up to individual implementations to make it work. I suppose a similar implementation to boost's function would be used.

Vast answered 20/2, 2011 at 13:59 Comment(0)

113

There are, indeed, performance issues with std:function that must be taken into account whenever using it. The main strength of std::function, namely, its type-erasure mechanism, does not come for free, and we might (but not necessarily must) pay a price for that.

std::function is a template class that wraps callable types. However, it is not parametrized on the callable type itself but only on its return and argument types. The callable type is known only at construction time and, therefore, std::function cannot have a pre-declared member of this type to hold a copy of the object given to its constructor.

Roughly speaking (actually, things are more complicated than that) std::function can hold only a pointer to the object passed to its constructor, and this raises a lifetime issue. If the pointer points to an object whose lifetime is smaller than that of the std::function object, then the inner pointer will become dangling. To prevent this problem std::function might make a copy of the object on the heap through a call to operator new (or a custom allocator). The dynamic memory allocation is what people refer the most as a performance penalty implied by std::function.

I have recently written an article with more details and that explains how (and where) one can avoid paying the price of a memory allocation.

Efficient Use of Lambda Expressions and std::function

Libertinage answered 31/1, 2012 at 23:34 Comment(9)

So this describes overhead of constructing / destructing a std::function. boost::function states this about invocation performance: "With a properly inlining compiler, an invocation of a function object requires one call through a function pointer. If the call is to a free function pointer, an additional call must be made to that function pointer (unless the compiler has very powerful interprocedural analysis)." – Vichyssoise 5/11, 2015 at 20:27

Is the dynamic allocation performed only once ? I mean, once initialized, does it perform exactly as if using function pointers? – Trudy 21/3, 2018 at 11:23

It's worthy to notice if the wrapped object is small (e.g. no more than 16 bytes for std::function on Linux) and small object optimization is turned on, std::function will not attempt to do any heap allocation. Note that you must use std::cref or std::ref to wrap the passed-in parameters to avoid copying during the call tree. In this case for function without too many parameters e.g. a std::shared_ptr; a simple primitive; etc, there is no heap allocation. This is particularly useful if one is wrapping some lambda with simple parameters. – Infra 16/8, 2018 at 3:48

Your link appears broken (redirects to main page of drdobbs.com) – Nice 30/10, 2019 at 8:54

@Nice Sadly so. Unfortunately DrDobbs closed down a few years ago and I don't know what is happening to old content. I couldn't find my article anywhere. I'm sorry and sad about that :-( – Libertinage 31/10, 2019 at 16:28

@Nice I can suggest this article which cites mine as similar. My approach was what this article calls "solution 3". – Libertinage 31/10, 2019 at 16:41

@Nice I've managed to find a working link! \o/ I hope it lasts. Thanks for bringing the issue to my attention. – Libertinage 12/11, 2019 at 10:0

@CassioNeri I think the link is again dead? at least I am not able to view the content without logging in – Sophistication 25/3, 2021 at 9:11

@MohammedNoureldin That's a shame. As I said in another comment, DrDobbs closed down a few years ago. Sometimes I can find old content somewhere, sometimes I can't. I don't know if I kept a copy of this article. Even if I do, I don't know if I'm allowed to publish/post it elsewhere. Often authors are required to give away the copyright to publishers and loose their rights. (Although DrDobbs is dead, their lawyers still might be awake.) I can't remember if that was the case of this article. If I can I'll try to recover it but I can't promise anything. I'm really sorry about that. – Libertinage 26/3, 2021 at 11:22

You can find information from the boost's reference materials: How much overhead does a call through boost::function incur? and Performance

Vast answered 20/2, 2011 at 13:59 Comment(0)

Firstly, the overhead gets smaller with the inside of the function; the higher the workload, the smaller the overhead.

Secondly: g++ 4.5 does not show any difference compared to virtual functions:

main.cc

#include <functional>
#include <iostream>

// Interface for virtual function test.
struct Virtual {
    virtual ~Virtual() {}
    virtual int operator() () const = 0;
};

// Factory functions to steal g++ the insight and prevent some optimizations.
Virtual *create_virt();
std::function<int ()> create_fun();
std::function<int ()> create_fun_with_state();

// The test. Generates actual output to prevent some optimizations.
template <typename T>
int test (T const& fun) {
    int ret = 0;
    for (int i=0; i<1024*1024*1024; ++i) {
        ret += fun();
    }    
    return ret;
}

// Executing the tests and outputting their values to prevent some optimizations.
int main () {
    {
        const clock_t start = clock();
        std::cout << test(*create_virt()) << '\n';
        const double secs = (clock()-start) / double(CLOCKS_PER_SEC);
        std::cout << "virtual: " << secs << " secs.\n";
    }
    {
        const clock_t start = clock();
        std::cout << test(create_fun()) << '\n';
        const double secs = (clock()-start) / double(CLOCKS_PER_SEC);
        std::cout << "std::function: " << secs << " secs.\n";
    }
    {
        const clock_t start = clock();
        std::cout << test(create_fun_with_state()) << '\n';
        const double secs = (clock()-start) / double(CLOCKS_PER_SEC);
        std::cout << "std::function with bindings: " << secs << " secs.\n";
    }
}

impl.cc

#include <functional>

struct Virtual {
    virtual ~Virtual() {}
    virtual int  operator() () const = 0;
};
struct Impl : Virtual {
    virtual ~Impl() {}
    virtual int  operator() () const { return 1; }
};

Virtual *create_virt() { return new Impl; }

std::function<int ()> create_fun() { 
    return  []() { return 1; };
}

std::function<int ()> create_fun_with_state() { 
    int x,y,z;
    return  [=]() { return 1; };
}

Output of g++ --std=c++0x -O3 impl.cc main.cc && ./a.out:

1073741824
virtual: 2.9 secs.
1073741824
std::function: 2.9 secs.
1073741824
std::function with bindings: 2.9 secs.

So, fear not. If your design/maintainability can improve from prefering std::function over virtual calls, try them. Personally, I really like the idea of not forcing interfaces and inheritance on clients of my classes.

Tso answered 18/1, 2012 at 9:53 Comment(6)

std::function can easily be implemented with virtual functions. However, most implementations seem to use function pointers to templated functions and a void* pointer. The indirection is practically the same. – Carbonic 18/1, 2012 at 9:57

@Xeo: True. But verification is better than belief :) When you don't use optimizations, the same test shows a 1:3 difference against std::function, so this test is not completely unjustified. – Tso 18/1, 2012 at 10:2

With G++ 4.8.2, I consistently get 2.9, 3.3 and 3.3 seconds. If I add -flto they all become 3.3. My totally wild guess is that GCC actually tries to optimize std::function (similar to what one gets with -flto and virtual functions), but the optimizations actually hurt. – Skittle 2/2, 2014 at 20:25

Using g++ 5.3, I get 2.0, 2.3, 2.3 (-O2); 0.7, 2.0, 2.0 (-O2 -flto); 2.3, 2.3, 2.3 (-O2 -flto -fno-devirtualize); 2.0, 2.3, 2.3 (-O2 -fno-devirtualize). So it appears devirtualization in newer g++ versions has improved enough that this is no longer a deoptimization. – Flathead 20/6, 2016 at 17:37

g++ 6.3.0: g++ -std=gnu++14 -O3 -flto -march=native impl.cpp main.cpp && ./a.out 1073741824 virtual: 1.97619 secs. 1073741824 std::function: 6.86855 secs. 1073741824 std::function with bindings: 6.86847 secs. – Justus 15/3, 2017 at 13:18

g++ 7.4.0 on Ubuntu 18.04 (AMD 2400G): ` g++ --std=c++17 -O3 impl.cc main.cc && ./a.out`: virtual: 1.38742 secs., std::function: 1.44681 secs., std::function with bindings: 1.39367 secs. – Cramoisy 30/5, 2019 at 15:50

This depends strongly if you are passing the function without binding any argument (does not allocate heap space) or not.

Also depends on other factors, but this is the main one.

It is true that you need something to compare against, you can't just simply say that it 'reduces overhead' compared to not using it at all, you need to compare it to using an alternative way to passing a function. And if you can just dispense of using it at all then it was not needed from the beginning

Polyandry answered 20/2, 2011 at 14:18 Comment(1)

Even binding arguments might not incur dynamic allocation if the implementation uses small-buffer optimisation to store the function object in the std::function instance and the passed callable is within the suitable size for SBO. – Mumble 14/5, 2017 at 21:49

std::function<> / std::function<> with bind( ... ) is extremely fast. Check this:

#include <iostream>
#include <functional>
#include <chrono>

using namespace std;
using namespace chrono;

int main()
{
    static size_t const ROUNDS = 1'000'000'000;
    static
    auto bench = []<typename Fn>( Fn const &fn ) -> double
    {
        auto start = high_resolution_clock::now();
        fn();
        return (int64_t)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / (double)ROUNDS;
    };
    int i;
    static
    auto CLambda = []( int &i, int j )
    {
        i += j;
    };
    auto bCFn = [&]() -> double
    {
        void (*volatile pFnLambda)( int &i, int j ) = CLambda;
        return bench( [&]()
            {   
                for( size_t j = ROUNDS; j--; j )
                    pFnLambda( i, 2 );
            } );
    };
    auto bndObj = bind( CLambda, ref( i ), 2 );
    auto bBndObj = [&]() -> double
    {
        decltype(bndObj) *volatile pBndObj = &bndObj;
        return bench( [&]()
            {
                for( size_t j = ROUNDS; j--; j )
                    (*pBndObj)();
            } );
    };
    using fn_t = function<void()>;
    auto bFnBndObj = [&]() -> double
    {
        fn_t fnBndObj = fn_t( bndObj );
        fn_t *volatile pFnBndObj = &fnBndObj;
        return bench( [&]()
            {
                for( size_t j = ROUNDS; j--; j )
                    (*pFnBndObj)();
            } );
    };
    auto bFnBndObjCap = [&]() -> double
    {
        auto capLambda = [&i]( int j )
        {
            i += j;
        };
        fn_t fnBndObjCap = fn_t( bind( capLambda, 2 ) );
        fn_t *volatile pFnBndObjCap = &fnBndObjCap;
        return bench( [&]()
            {
                for( size_t j = ROUNDS; j--; j )
                    (*pFnBndObjCap)();
            } );
    };
    using bench_fn = function<double()>;
    static const
    struct descr_bench
    {
        char const *descr;
        bench_fn const fn;
    } dbs[] =
    {
        { "C-function",
          bench_fn( bind( bCFn ) ) },
        { "C-function in bind( ... ) with all parameters",
          bench_fn( bind( bBndObj ) ) },
        { "C-function in function<>( bind( ... ) ) with all parameters",
          bench_fn( bind( bFnBndObj ) ) },
        { "lambda capturiging first parameter in function<>( bind( lambda, 2 ) )",
          bench_fn( bind( bFnBndObjCap ) ) }
    };
    for( descr_bench const &db : dbs )
        cout << db.descr << ":" << endl,
        cout << db.fn() << endl;
}

All calls are below 2ns on my computer.

Hexagon answered 20/9, 2021 at 9:3 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags