C++ What actually happens in assembly when you return a struct from a function?

Asked 15/4, 2021 at 11:28 Answered 16/4, 2021 at 4:35

Solved c++assembly x86-64 calling-convention abi

I'm trying to figure out what actually happens in C++ if you return a struct by value from a function, vs. return a pointer to the struct. How is a struct communicated when its sent by value if a function can only return a value that can fit in a register? (I read that somewhere.)

I tried testing it on Godbolt to see what it's doing. But I don't understand the assembly so that was a bit optimistic of me.

It does look to me without much assembly knowledge that the function is just changing some memory that exists before the function is called? So then the concept of returning something from a function is just an abstraction and the function is just setting some bytes in a memory location that already exists and then end jumping back to main()? In which case nothing is copied at all and the return is "free"?

Godbolt: Return int

Godbolt: Return struct{int int int}

Sherwood answered 15/4, 2021 at 11:28 Comment(8)

Consult the appropriate ABI/calling convention documentation. But yeah, for sysv large structs are returned via memory pointed to by a hidden first argument. Note that your example still fits into the rdx:rax register pair on x86-64 that you use on godbolt. You can change to long if you want to see the pass by memory in action, or add more members to exceed 16 bytes. – Desiredesirea 15/4, 2021 at 11:33

thanks for showing me godbolt, really nice! – Altissimo 15/4, 2021 at 11:45

Normally I would say that questions like this should be closed and referred to documentation, but this seems like something where it would be really hard to know where to look for documentation if you don't already know something about the answer, and it's not really an ordinary tutorial topic either. It is something I would expect one to run across at an undergraduate university level, were one to take a sufficiently specialized set of courses. – Matchwood 15/4, 2021 at 11:47

Reminder that it is very much tool and version specific, one compiler one version can choose to do it one way, another do it another way (shortcut small structs into registers or not basically in this case). There is no reason whatsoever that a compiler author must use an ISA vendors recommended calling convention, the compiler authors can choose however they want to do it for each version of their tool as they see fit (not conforming or changing their mind between versions might have obvious negatives) – Selfinduced 15/4, 2021 at 13:29

Try it with a larger struct. – Selfinduced 15/4, 2021 at 13:30

This depends on architecture and ABI (which varies between operating systems). Your examples are for x86-64 with the SysV ABI; can we assume that's what you want to know about? I've added tags. Links to the ABI document can be found at #18134312. It specifies the precise rules for this in Section 3.2.3 (in the version I'm looking at), but they are indeed a bit complex. – Bruni 15/4, 2021 at 14:8

Also check out #54023444 - is that a duplicate, or do you want an answer at a different level? – Bruni 15/4, 2021 at 14:11

@old_timer: In practice it's not specific to compiler version number, not when targeting a mainstream OS with a well-established calling-convention. Except possibly for "internal" function calls (C/C++ static functions, or private clones) that can't be called from code outside the current compilation unit; then compilers can in theory invent custom calling conventions if they don't simply inline. – Bloodletting 16/4, 2021 at 2:28

So I spent hours playing with Godbolt's Compiler Explorer and reading up until I figured out the practical answer.

What I've gathered is this:

If the value fits into a register, it's left in a register as the return value.
If the value fits in 2 registers, it's left in 2 registers.
If the value is larger than this, the caller reserves memory in its own stack and the function writes directly into the caller's stack.

Both G++ & Clang do the same, this is tested on x86_64.

Sherwood answered 16/4, 2021 at 4:35 Comment(4)

Yup, G++ and clang for Linux both follow the x86-64 System V ABI, which you can read (github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf) for the full details on stuff like a struct of 4 floats being returned in XMM0 (SSE / SSEUPPER types). Windows may have different rules for when it can return in a pair of integer registers. (__attribute__((ms_abi)), or compile with MSVC). Also note that for C++, return in register only happens for trivially-copyable types (or something like that, the actual property required might be a related but different one, I forget.) – Bloodletting 16/4, 2021 at 4:42

Note that passing in registers is not just a matter of the binary representation size. What is quite important is whether the struct/class is trivial. If it's not trivial, it cannot be passed in registers according to the commonly applied ABIs. On the other side, passing in memory allows NRVO. Comapre both cases: : godbolt.org/z/T3eq8jc9P. – Mansour 16/4, 2021 at 7:33

@PeterCordes IIRC, it's called being (non-)trivial for the purpose of calls: itanium-cxx-abi.github.io/cxx-abi/abi.html#non-trivial. – Mansour 16/4, 2021 at 7:40

Sorry, the previous link is wrong. It should have been godbolt.org/z/s73fbMYMM. – Mansour 16/4, 2021 at 7:46

How is a struct communicated when its sent by value if a function can only return a value that can fit in a register?

A function can return whatever is legal to return. However, only register-sized-or-smaller values can be used to implement the return statement by the convention of leaving a value behind in a single register, for obvious reasons. Some implementations allow for representing large data types using more than one register; of course, this means that the caller has to be written to expect to examine multiple registers to get the full return value.

"What happens" at the machine level is not specified by the language standard, and depends on the particular compiler, its optimization capabilities, details of the architecture, etc. However, the straightforward implementation on ordinary platforms is to have the caller reserve space on the stack (so that it lasts beyond the cleanup) and have the callee write the data there. Since the allocation is static, typically the required space can simply be taken into account when computing the size of the stack frame for the caller. The implementation might silently generate a pointer and pass it to the callee in a register; or it might arrange that every caller puts this reserved space in the same place in its stack frame, such that the callee can add an offset to the stack pointer to determine where to write; or it might do some other thing that I'm currently not creative enough to think of.

There are any number of ways to handle communication of information between functions at the machine level, depending on both the machine and the language (although we are usually talking about either C or C++ when we have these discussions, since all the other popular choices either run on a VM, are interpreted or have some other fancy thing going on). The general term you want to look into is Application Binary Interface, or ABI.

Matchwood answered 15/4, 2021 at 11:42 Comment(2)

its optimization capabilities - generally no, all real-world compilers follow some standard calling convention so it's possible for code in one file to call functions in another file without having to use the same optimization options. Making the calling convention depend on a compiler's ability to spot some optimization (e.g. removing an unused struct member) is hugely problematic and isn't something compilers do. (Usually not even with whole-program link-time-optimization.) – Bloodletting 16/4, 2021 at 2:30

If we're inventing hypothetical calling conventions other than the usual (passing a pointer to a return-value object for struct that don't fit in a couple regs), the function could return with the stack pointer different from before the call, effectively having pushed the return-value onto the stack. (For x86-64 where the return address is passed on the stack instead of a link register, this would mean popping the return address, pushing the struct, then pushing another copy of the return address before running a ret. So it's not a good or convenient convention but it is possible) – Bloodletting 16/4, 2021 at 2:33