C/C++ returning struct by value under the hood

Asked 21/8, 2016 at 20:19 Answered 21/8, 2016 at 21:36

(This question is specific to my machine's architecture and calling conventions, Windows x86_64)

I don't exactly remember where I had read this, or if I had recalled it correctly, but I had heard that, when a function should return some struct or object by value, it will either stuff it in rax (if the object can fit in the register width of 64 bits) or be passed a pointer to where the resulting object would be (I'm guessing allocated in the calling function's stack frame) in rcx, where it would do all the usual initialization, and then a mov rax, rcx for the return trip. That is, something like

extern some_struct create_it(); // implemented in assembly

would really have a secret parameter like

extern some_struct create_it(some_struct* secret_param_pointing_to_where_i_will_be);

Did my memory serve me right, or am I incorrect? How are large objects (i.e. wider than the register width) returned by value from functions?

Emmetropia answered 21/8, 2016 at 20:19 Comment(6)

You omitted a return type from your example function, when you want information about what is returned. – Flareup 21/8, 2016 at 20:21

It seems like you're talking about return value optimization, which is C++ specific. I'm unaware of any such feature in C. – Schaub 21/8, 2016 at 20:25

It really depends on the C++ compiler, and on the level of optimizations that you are using for your build. There are many C++ compilers out there for x86-64 win64. – Lipson 21/8, 2016 at 20:26

@mike: no, it is defined by the platform ABI. If it were allowed to vary between optimization levels, you wouldn't be able to use external libraries. If it could vary between compilers, you'd need different libraries for each compiler. This is why calling conventions must be defined for each platform, and hence why platform ABIs exist. – Karikaria 21/8, 2016 at 20:42

Yes. Doesn't just apply to the return value, also the parameters. The exact details are quite convoluted and poorly documented. Agner Fog did the hard work, chapter 7 has every you'd want to know and more. – Brownstone 21/8, 2016 at 21:1

@Karikaria ok, thanks. I stand corrected. (Funny thing is, I have read the ABI. I just did not remember that.) – Lipson 22/8, 2016 at 5:38

Here's a simple disassembling of a code exampling what you're saying

typedef struct 
{
    int b;
    int c;
    int d;
    int e;
    int f;
    int g;
    char x;
} A;

A foo(int b, int c)
{
    A myA = {b, c, 5, 6, 7, 8, 10};
    return myA; 
}

int main()
{   
    A myA = foo(5,9);   
    return 0;
}

and here's the disassembly of the foo function, and the main function calling it

main:

push    ebp
mov     ebp, esp
and     esp, 0FFFFFFF0h
sub     esp, 30h
call    ___main
lea     eax, [esp+20]        ; placing the addr of myA in eax
mov     dword ptr [esp+8], 9 ; param passing 
mov     dword ptr [esp+4], 5 ; param passing
mov     [esp], eax           ; passing myA addr as a param
call    _foo
mov     eax, 0
leave
retn

foo:

push    ebp
mov     ebp, esp
sub     esp, 20h
mov     eax, [ebp+12]  
mov     [ebp-28], eax
mov     eax, [ebp+16]
mov     [ebp-24], eax
mov     dword ptr [ebp-20], 5
mov     dword ptr [ebp-16], 6
mov     dword ptr [ebp-12], 7
mov     dword ptr [ebp-8], 9
mov     byte ptr [ebp-4], 0Ah
mov     eax, [ebp+8]
mov     edx, [ebp-28]
mov     [eax], edx     
mov     edx, [ebp-24]
mov     [eax+4], edx
mov     edx, [ebp-20]
mov     [eax+8], edx
mov     edx, [ebp-16]
mov     [eax+0Ch], edx
mov     edx, [ebp-12]
mov     [eax+10h], edx
mov     edx, [ebp-8]
mov     [eax+14h], edx
mov     edx, [ebp-4]
mov     [eax+18h], edx
mov     eax, [ebp+8]
leave
retn

now let's go through what just happened, so when calling foo the paramaters were passed in the following way, 9 was at highest address, then 5 then the address the myA in main begins

lea     eax, [esp+20]        ; placing the addr of myA in eax
mov     dword ptr [esp+8], 9 ; param passing 
mov     dword ptr [esp+4], 5 ; param passing
mov     [esp], eax           ; passing myA addr as a param

within foo there is some local myA which is stored on the stack frame, since the stack is going downwards, the lowest address of myA begins in [ebp - 28], the -28 offset could be caused by struct alignments so I'm guessing the size of the struct should be 28 bytes here and not 25 as expected. and as we can see in foo after the local myA of foo was created and filled with parameters and immediate values, it is copied and re-written to the address of myA passed from main ( this is the actual meaning of return by value )

mov     eax, [ebp+8]
mov     edx, [ebp-28]

[ebp + 8] is where the address of main::myA was stored ( memory address go upwards hence ebp + old ebp ( 4 bytes ) + return address ( 4 bytes )) at overall ebp + 8 to get to the first byte of main::myA, as said earlier foo::myA is stored within [ebp-28] as stack goes downwards

mov     [eax], edx

place foo::myA.b in the address of the first data member of main::myA which is main::myA.b

mov     edx, [ebp-24]
mov     [eax+4], edx

place the value that resides in the address of foo::myA.c in edx, and place that value within the address of main::myA.b + 4 bytes which is main::myA.c

as you can see this process repeats itself through out the function

mov     edx, [ebp-20]
mov     [eax+8], edx
mov     edx, [ebp-16]
mov     [eax+0Ch], edx
mov     edx, [ebp-12]
mov     [eax+10h], edx
mov     edx, [ebp-8]
mov     [eax+14h], edx
mov     edx, [ebp-4]
mov     [eax+18h], edx
mov     eax, [ebp+8]

which basically proves that when returning a struct by val, that could not be placed in as a param, what happens is that the address of where the return value should reside in is passed as a param to the function and within the function being called the values of the returned struct are copied into the address passed as a parameter...

hope this exampled helped you visualize what happens under the hood a little bit better :)

EDIT

I hope that you've noticed that my example was using 32 bit assembler and I KNOW you've asked regarding x86-64, but I'm currently unable to disassemble code on a 64 bit machine so I hope you take my word on it that the concept is exactly the same both for 64 bit and 32 bit, and that the calling convention is nearly the same

Deron answered 21/8, 2016 at 21:36 Comment(3)

Use gcc.godbolt.org to look at compiler output for x86, x86-64, ARM, MIPS, PowerPC, or AVR, with clang or gcc. Also, you could shorten the example by using A myA = { b, c, 5, 6, 7, 8, 9, 10}. And probably drop a couple struct members to shorten the asm. As long at it's larger than 128 bits, the x86-64 SysV calling convention won't pack it into RDX:RAX. – Makings 21/8, 2016 at 21:59

Also, the asm would be a lot easier to read if you compiled foo with optimization enabled. Having it store to a local on the stack and then copy that to the return value pointer is really noisy and hard to follow if you don't already know that's what it's doing. – Makings 21/8, 2016 at 22:0

Originally this code did a little bit more complicated as part of me testing the disassembly of it, I've quickly edited it without changing that, but you're right though it's much more readable, wouldn't do that in actual code developing :), and I agree with you, I just wanted to prove that the actual copying is what goes on under the hood @PeterCordes – Deron 21/8, 2016 at 22:1

That is exactly correct. The caller passes an extra argument which is the address of the return value. Normally it will be on the caller's stack frame but there are no guarantees.

The precise mechanics are specified by the platform ABI, but this mechanism is very common.

Various commentators have left useful links with documentation for calling conventions, so I'll hoist some of them into this answer:

Wikipedia article on x86 calling conventions
Agner Fog's collection of optimization resources, including a summary of calling conventions (Direct link to 57-page PDF document.)
Microsoft Developer Network (MSDN) documentation on calling conventions.
StackOverflow x86 tag wiki has lots of useful links.

Karikaria answered 21/8, 2016 at 20:33 Comment(1)

more specifically, see the x86 tag wiki for links to ABI docs that specify the rules for when a struct is packed into one or two registers, or when a pointer is passed as a hidden first-arg. The rules differ between ABIs, for 32 and 64 bit. See also structs in calling conventions on SO docs – Makings 21/8, 2016 at 20:47

Recommended topics

Hot tags