Why does std::tuple break small-size struct calling convention optimization in C++?
Asked Answered
F

2

32

C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:

class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }

bar1() and bar2() generate almost identical assembly code except for calling foo(int) and foo(MyInt) respectively. Specifically on x86_64, it looks like:

        mov     edi, 1
        jmp     foo(MyInt) ;tail-call optimization jmp instead of call ret

But if we test std::tuple<int>, it will be different:

void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }

struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }

The generated assembly code looks totally different, the small-size struct (std::tuple<int>) is passed by pointer:

        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(std::tuple<int>)
        add     rsp, 24
        ret

I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):

class Empty {};
class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; };
void foo(MyDirtyInt);
void bar5() { foo(MyDirtyInt(1)); }

but the calling convention optimization is applied:

        mov     edi, 1
        jmp     foo(MyDirtyInt)

I have tried GCC/Clang/MSVC, and they all showed the same behavior. (Godbolt link here) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)

I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>) is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.

FYI, in case you're curious about what I'm doing with std::tuple, I want to create a wrapper class (i.e. the strong typedef) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple was a good base class because everything was there.


OP's Edit: Daniel Langr has pointed the root cause in the answer below. Please also check the comments under that answer. And there is already a fix for this committed one year afterwards and is merged to gcc since the release of gcc 12.1.0, which is almost 2 years afterwards.

Foppish answered 3/9, 2020 at 7:58 Comment(10)
I suspect it's the std::tuple's user-defined copy/move constructors that affect the behavior here. But I can't tell you what in the ABI it's interacting with here.Valetudinary
Normally it’s a nontrivial destructor which causes this behaviour, but the std::tuple<int> destructor should be trivial.Turaco
@KonradRudolph Anyway, adding user-defined destructor to MyInt has the same effect: godbolt.org/z/s4zzcx.Insane
@DanielLangr Yes, that’s what I said. It seems that supplying a custom version of any of copy constructor, move constructor or destructor causes an ABI change. I guess this makes sense.Turaco
I think it worth mentioning that this is just another case of what Chandler Carruth talked about in CppCon 2019, "there is no zero-cost abstraction". His example was about std::unique_ptr<T> but the cases are very similar. youtu.be/rHIkrotSwcc?t=1050Shameful
@YehezkelB. I am not sure this is the same case. With libc++, you get passing by registers. It seems to me more like a quality of implementation issue.Insane
OK, not exactly the same, because tuple<int> d-tor is trivial, while unique_ptr's isn't, so in tuple case it depends on the way the implementation handles copy/move c-tors, but for unique_ptr there is no other option. But the reasoning behind it is still similar.Shameful
duplicates: Is returning a 2-tuple less efficient than std::pair?, Why is std::pair faster than std::tupleStabilize
Does this answer your question? Is returning a 2-tuple less efficient than std::pair?Stabilize
gcc.gnu.org/bugzilla/show_bug.cgi?id=71301Contradistinguish
I
13

It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads:

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

And, further:

A type is considered non-trivial for the purposes of calls if it has a non-trivial copy constructor, move constructor, or destructor, or all of its copy and move constructors are deleted.

The same requirement is in AMD64 ABI Draft 1.0.

For instance, in libstdc++, std::tuple has non-trivial move constructor: https://godbolt.org/z/4j8vds. The Standard prescribes both copy and move constructor as defaulted, which is satisfied here. However, at the same time, tuple inherits from _Tuple_impl and _Tuple_impl has a user-defined move constructor. Consequenlty, move constructor of tuple itself cannot be trivial.

On the contrary, in libc++, both copy and move constructors of std::tuple<int> are trivial. Therefore, the argument is passed in a register there: https://godbolt.org/z/WcTjM9.

As for Microsoft STL, std::tuple<int> is trivially neither copy-constructible nor move-constructible. It even seems to break the C++ Standard rules. std::tuple is defined recursively and, at the end of recursion, std::tuple<> specialization defines non-defaulted copy constructor. There is a comment about this issue: // TRANSITION, ABI: should be defaulted. Since tuple<> has no move constructor, both copy and move constructors of tuple<class...> are non-trivial.

Insane answered 3/9, 2020 at 12:31 Comment(10)
So is libstdc++ non-compliant here? [gasp]Packaging
@Packaging Why? The Standard only says that the move constructor of std::tuple must be defined as defaulted. Which is satisfied by libstdc++.Insane
Your 2nd sentence contradicts your own post.Jenellejenesia
@MaximEgorushkin Don't understand. In libstdc++, std::tuple has defaulted move constructor, according to the Standard. At the same time, std::tuple inherits from _Tuple_impl. Since _Tuple_impl has user-defined move constructor, the move constructor of std::tuple isn't trivial. Even if it's defined as defaulted.Insane
@DanielLangr Oh, you are saying those constructors are defaulted in class std::tuple indeed, but not in its base classes. My mistake.Jenellejenesia
@MaximEgorushkin I reworded the explanation in the answer to make it hopefully more clear in this context :).Insane
I wonder why stdlibc++ doesn't default the copy and move constructors of the base classes? Make it constexpr and conditionally noexcept, fine, but default the implementation. And now this would be an ABI breaking change. As it stands now it is making mockery out of the standard.Jenellejenesia
@MaximEgorushkin It even seems that a relevant patch has been proposed, but not accepted because of breaking backward compatibility regarding calling conventions.Insane
I thought the failure of such optimization should be related to std library implementation, so I switched between compilers on godbolt. But I never knew clang on godbolt was using libstdc++ by default and in order to use libc++ the corresponding argument must be provided.Foppish
ABI stability fundamentalists are ruining C++ once again. :-(Turaco
C
4

As suggested by @StoryTeller it might be related to a user defined move constructor inside std::tuple that causes this behavior.

See for example: https://godbolt.org/z/3M9KWo

Having user defined move constructor leads to the non-optimized assembly:

bar_my_tuple():
        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(MyTuple<int>)
        add     rsp, 24
        ret

In libcxx for example the copy and move constructors are declared as default both for tuple_leaf and for tuple, and you get the small-size struct call convention optimization for std::tuple<int> but not for std::tuple<std::string> which is holding a non trivially moveable member and thus becomes naturally non trivially moveable by itself.

Contend answered 3/9, 2020 at 11:39 Comment(2)
You need to tell the compiler to use libc++: godbolt.org/z/WcTjM9. By default, clang on Compiler Explorer uses libstdc++.Insane
@DanielLangr hey that's great! it actually presents that for libc++ std::tuple<int> is move constructible and thus it has the small-size struct call convention optimization!Contend

© 2022 - 2024 — McMap. All rights reserved.