The reasoning behind Clang's implementation of std::function's move semantics
Asked Answered
A

1

31

In libc++'s implementation of std::function, if the function object whose type is being erased is small enough to fit inside an SBO then the move operation will copy it, not move it. Yet not every object whose stack memory footprint is small is optimal to be copied. Why copy rather than move?

Consider this example with Clang (using shared_ptr because it has reference counting):

https://wandbox.org/permlink/9oOhjigTtOt9A8Nt

The semantics in test1() is identical to that of test3() where an explicit copy is used. shared_ptr helps us to see that.

On the other hand, GCC behaves reasonably and predictably:

https://wandbox.org/permlink/bYUDDr0JFMi8Ord6

Both are allowed by the standard. std::function requires functions to be copyable, a moved-from object is left in unspecified state, and so on. Why do that? The same reasoning may be applied to std::map: if both the key and value are copyable, then why not make a new copy whenever someone std::moves a std::map? That would also be within the standard's requirements.

According to cppreference.com there should be a move and it should be the target.

The example:

#include <iostream>
#include <memory>
#include <functional>
#include <array>
#include <type_traits>

void test1()
{
    /// Some small tiny type of resource. Also, `shared_ptr` is used because it has a neat
    /// `use_count()` feature that will allow us to see what's going on behind the 'curtains'.
    auto foo = std::make_shared<int>(0);

    /// Foo is not actually a trivially copyable type. Copying it may incur a huge overhead.
    /// Alas, per-C++23 we don't have a pleasure of `move_only_function`, 
    /// so 'staying standard' we're stuck with the std::function.
    static_assert(!std::is_trivially_copyable_v<std::decay_t<decltype(foo)>>);
    static_assert(!std::is_trivially_copy_constructible_v<std::decay_t<decltype(foo)>>);

    std::cout << std::endl;
    std::cout << "Test 1: tiny function that is probably stored in SBO" << std::endl;
    std::cout << "Ref count: " << foo.use_count() << std::endl;
    
    std::function<void()> f = [foo] {
        /// Do stuff.  
    };

    std::cout << "Ref count: " << foo.use_count() << std::endl;

    {
        auto g = std::move(f);

        /// Underlying, type-erased data is actually copied not moved
        std::cout << "Ref count: " << foo.use_count() << std::endl;
    }

    std::cout << "Ref count: " << foo.use_count() << std::endl;
}

void test2()
{
    auto foo = std::make_shared<int>(0);

    std::cout << std::endl;
    std::cout << "Test 2: non-tiny function that doesn't fit in SBO" << std::endl;
    std::cout << "Ref count: " << foo.use_count() << std::endl;
    
    std::function<void()> f = [foo, bar = std::array<char, 1024>()] {
        (void)bar;
        /// Do stuff.
    };

    std::cout << "Ref count: " << foo.use_count() << std::endl;

    {
        auto g = std::move(f);

        std::cout << "Ref count: " << foo.use_count() << std::endl;
    }

    std::cout << "Ref count: " << foo.use_count() << std::endl;
}

void test3()
{
    auto foo = std::make_shared<int>(0);

    std::cout << std::endl;
    std::cout << "Test 3: tiny function but using a copy" << std::endl;
    std::cout << "Ref count: " << foo.use_count() << std::endl;
    
    std::function<void()> f = [foo] {
        /// Do stuff.  
    };

    std::cout << "Ref count: " << foo.use_count() << std::endl;

    {
        auto g = f;

        std::cout << "Ref count: " << foo.use_count() << std::endl;
    }

    std::cout << "Ref count: " << foo.use_count() << std::endl;
}

int main()
{
    test1();
    test2();
    test3();
    return 0;
}
Auklet answered 11/1, 2023 at 12:4 Comment(8)
Related: draft proposal P0288R9 (move_only_function).Copulate
@Copulate yes, thanks, I'm aware of that. I've even mentioned that in the comments to my test cases. Unfortunately, currently you either venture outside of a standard library and use one of the available third party implementations, write your own or stay 'within a standard' and stick to what's available in stdlibAuklet
Could it relate to this predicate trait?Copulate
@Copulate unlikely. I'm not sure about the _VSTD::is_scalar, but std::is_scalar_v<std::shared_ptr<...>> is false. I'm willing to bet that std::is_scalar_v is implemented as an alias of _VSTD::is_scalar.Auklet
Note that you are comparing cases where libc++ uses SBO, but libstdc++ does not: godbolt.org/z/8PWrcvTb6.Formalize
@DanielLangr AFAIK the stdlibc++ does use an SBO, albeit with a smaller buffer of 16 bytes. Look at your own example. I'm not sure that's your idea of std::cout-ing an address of f and an address of f + 1. What does that tell us?Auklet
@Auklet By comparing with the address of foo, it tells us whether SBO takes place or not. Yes, libstdc++ uses 16 bytes, while libc++ 24 (godbolt.org/z/hxrPMYEe5), but that lambda is 16 byte long, so it's not about the size of the buffer, since the lambda fits into it with both implementations.Formalize
I left that comment because you have 2 links to elsewhere whose relevant/necessary content should be in this post & should be related to the rest of the post by prose in the post. It's not clear what that code has to do with the post. PS Please don't insert "EDIT"s/"UPDATE"s, just make your post the best presentation as of edit time. Please avoid social & meta commentary in posts, just ask or answer the question. Please minimize emphatic fonts, clarity comes from clear writing. The boldface sentence isn't needed except you should clearly say in normal font what the code is & why it's there.Brannen
I
34

It is a bug in libc++ that cannot be immediately fixed because it would break ABI. Apparently, it is a conforming implementation, although obviously it is often suboptimal.

It's not clear exactly why the Clang devs made such an implementation choice in the first place (although maybe if you're really lucky, someone from Clang will show up and answer this question). It may simply have to do with the fact that Clang's strategy avoids having to have a "vtable" entry for move construction, and thus simplifies the implementation. Also, as I wrote elsewhere, the Clang implementation only uses SOO if the callable is nothrow-copy-constructible in the first place, so it will never use SOO for things that have to allocate from the heap (like a struct that contains a std::vector) so it will never copy such things upon move construction*. That means the practical effect of the cases where it does copy instead of moving is limited (although it will certainly still cause degraded performance in some cases, such as with std::shared_ptr, where a copy operation must use atomic instructions and a move operation is almost free).

* OK, there is a caveat here: if you use the allocator-extended move constructor, and the provided allocator is unequal to the one from the source object, you force the libc++ implementation to perform a copy, since, in the case of unequal allocators, it can't just take ownership of the pointer to the out-of-line callable held by the source object. However, you shouldn't use the allocator-extended move constructor anyway; allocator support was removed in C++17 because implementations had various issues with it.

Insusceptible answered 11/1, 2023 at 13:26 Comment(2)
Possibly relevant commit: github.com/llvm-mirror/libcxx/commit/…. When I defined _LIBCPP_ABI_OPTIMIZED_FUNCTION, then I got the same results with libc++ as with libstdc++: godbolt.org/z/6Wa1KPrqG.Formalize
N.b. if you choose to enable this in libc++, the correct approach is outlined here (you really don't want to just define _LIBCPP_ABI_OPTIMIZED_FUNCTION...).Dupe

© 2022 - 2025 — McMap. All rights reserved.