C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:
class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }
bar1()
and bar2()
generate almost identical assembly code except for calling foo(int)
and foo(MyInt)
respectively. Specifically on x86_64, it looks like:
mov edi, 1
jmp foo(MyInt) ;tail-call optimization jmp instead of call ret
But if we test std::tuple<int>
, it will be different:
void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }
struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }
The generated assembly code looks totally different, the small-size struct (std::tuple<int>
) is passed by pointer:
sub rsp, 24
lea rdi, [rsp+12]
mov DWORD PTR [rsp+12], 1
call foo(std::tuple<int>)
add rsp, 24
ret
I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):
class Empty {};
class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; };
void foo(MyDirtyInt);
void bar5() { foo(MyDirtyInt(1)); }
but the calling convention optimization is applied:
mov edi, 1
jmp foo(MyDirtyInt)
I have tried GCC/Clang/MSVC, and they all showed the same behavior. (Godbolt link here) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)
I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>)
is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.
FYI, in case you're curious about what I'm doing with std::tuple
, I want to create a wrapper class (i.e. the strong typedef) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple
was a good base class because everything was there.
OP's Edit: Daniel Langr has pointed the root cause in the answer below. Please also check the comments under that answer. And there is already a fix for this committed one year afterwards and is merged to gcc since the release of gcc 12.1.0, which is almost 2 years afterwards.
std::tuple
's user-defined copy/move constructors that affect the behavior here. But I can't tell you what in the ABI it's interacting with here. – Valetudinarystd::tuple<int>
destructor should be trivial. – TuracoMyInt
has the same effect: godbolt.org/z/s4zzcx. – Insanestd::unique_ptr<T>
but the cases are very similar. youtu.be/rHIkrotSwcc?t=1050 – Shamefultuple<int>
d-tor is trivial, whileunique_ptr
's isn't, so intuple
case it depends on the way the implementation handles copy/move c-tors, but forunique_ptr
there is no other option. But the reasoning behind it is still similar. – Shameful