I've stumbled across an oddity in MSVCs codegen, regarding structures that are used as return-values. Consider the following code (live demo here):
struct Result
{
uint64_t value;
};
Result makeResult(uint64_t value)
{
return { value };
}
struct ResultFactory
{
NOINLINE Result MakeResult(uint64_t value) const
{
return { value };
}
};
We have a struct, which perfectly fullfils the x64-APIs condition for being returned in RAX. And as long as the free function is used, this is the case:
value$ = 8
Result makeResult(unsigned __int64) PROC ; makeResult, COMDAT
mov rax, rcx
ret 0
Result makeResult(unsigned __int64) ENDP ; makeResult
Now when we look at the member-function, it looks slightly different:
Result ResultFactory::MakeResult(unsigned __int64)const PROC ; ResultFactory::MakeResult, COMDAT
mov QWORD PTR [rdx], r8
mov rax, rdx
ret 0
Result ResultFactory::MakeResult(unsigned __int64)const ENDP ; ResultFactory::MakeResult
Here, the compiler decided to require "Result" to have a reference passed in the first register (well, RDX/second, as that's what MSVC does for member-functions in the first place when RAX cannot be returned).
Why would that be the case? Is there any good reason for that? It seems needlessly pessimising code-gen, and I really see no benefit to it. Having "RCX" always be this kind of makes sense, but always requiring a reference, even for primitive structs? This also means that there is unfortunately a very real difference between using a member-function and a free function, as long as neigther can be inlined. Or in case where a member-function is used, you it could be faster to just return a primitive type and bit_cast it across the function boundary (whether or not that all matters is another question, but it shouldn't be the case frankly).
Clang/GCC seem to do it "right". I'm not 100% sure if this is just a MSVC quirk, or actually the x64-windows calling convention (MSDN doesn't really say anything about c++ specifically). Anyone got a clue what's going on here?
EDIT: As pointed out by @Turtlefight, this is indeed mandated by the Windows-ABI. My follow-up, or rewording of this question would then be - why does the windows-ABI make this distinction, when it seems to only lead to worse code-gen, plus actually makes handling global and member-functions be wastly different and thus more complex. In case anyone would know why it was designed that way.
clang-cl
to prove that. – Falbala