I'm trying to understand the implication of System V AMD64 - ABI for returning by value from a function.
For the following data type
struct Vec3{
double x, y, z;
};
the type Vec3
is of class MEMORY and thus the following is specified by the ABI concerning "Returning of Values":
If the type has class MEMORY, then the caller provides space for the return value and passes the address of this storage in %rdi as if it were the first argument to the function. In effect, this address becomes a “hidden” first argument. This storage must not overlap any data visible to the callee through other names than this argument.
On return %rax will contain the address that has been passed in by the caller in %rdi.
With this in mind, the following (silly) function:
struct Vec3 create(void);
struct Vec3 use(){
return create();
}
could be compiled as:
use_v2:
jmp create
In my opinion, tailcall-optimization can be performed, as we are assured by the ABI, that create
will place in %rdi
passed value into %rax
register.
However, none of the compilers (gcc, clang, icc) seem to be performing this optimization (here on godbolt). The resulting assembly code saves %rdi
on stack only to be able move its value to %rax
, for example gcc:
use:
pushq %r12
movq %rdi, %r12
call create
movq %r12, %rax
popq %r12
ret
Neither for this minimal, silly function nor for more complicated ones from real life, tailcall-optimization is performed. Which leads me to believe, that I must be missing something, which prohibits it.
Needless to say, for types of class SSE (e.g. only 2 and not 3 doubles), tailcall-optimization is performed (at least by gcc and clang, live on godbolt):
struct Vec2{
double x, y;
};
struct Vec2 create(void);
struct Vec2 use(){
return create();
}
results in
use:
jmp create