Why is the __stdcall calling convention ignored in x64?
Asked Answered
G

1

4

I know what the differences between __cdecl and __stdcall are, but I'm not quite sure as to why __stdcall is ignored by the compiler in x64 builds.

The functions in the following code

int __stdcall stdcallFunc(int a, int b, int c, int d, int e, int f, int g)
{
    return a + b + c + d + e + f + g;
}

int __cdecl cdeclFunc(int a, int b, int c, int d, int e, int f, int g)
{
    return a + b + c + d + e + f + g;
}

int main()
{
    stdcallFunc(1, 2, 3, 4, 5, 6, 7);
    cdeclFunc(1, 2, 3, 4, 5, 6, 7);

    return 0;
}

have enough parameters to exceed the available CPU registers. Therefore, some arguments must be passed via the stack. I'm not fluent in assembly but I noticed some differences between x86 and x64 assembly.

x64

main    PROC
$LN3:
        sub     rsp, 72                             ; 00000048H
        mov     DWORD PTR [rsp+48], 7
        mov     DWORD PTR [rsp+40], 6
        mov     DWORD PTR [rsp+32], 5
        mov     r9d, 4
        mov     r8d, 3
        mov     edx, 2
        mov     ecx, 1
        call    ?stdcallFunc@@YAHHHHHHHH@Z          ; stdcallFunc
        mov     DWORD PTR [rsp+48], 7
        mov     DWORD PTR [rsp+40], 6
        mov     DWORD PTR [rsp+32], 5
        mov     r9d, 4
        mov     r8d, 3
        mov     edx, 2
        mov     ecx, 1
        call    ?cdeclFunc@@YAHHHHHHHH@Z                ; cdeclFunc
        xor     eax, eax
        add     rsp, 72                             ; 00000048H
        ret     0
main    ENDP

x86

_main   PROC
        push    ebp
        mov     ebp, esp
        push    7
        push    6
        push    5
        push    4
        push    3
        push    2
        push    1
        call    ?stdcallFunc@@YGHHHHHHHH@Z          ; stdcallFunc
        push    7
        push    6
        push    5
        push    4
        push    3
        push    2
        push    1
        call    ?cdeclFunc@@YAHHHHHHHH@Z                ; cdeclFunc
        add     esp, 28                             ; 0000001cH
        xor     eax, eax
        pop     ebp
        ret     0
_main   ENDP
  1. The first 4 arguments are, as expected, passed via registers in x64.
  2. The remaining arguments are put on the stack in the same order as in x86.
  3. Contrary to x86, in x64 we don't use push instructions. Instead we reserve enough stack space at the beginning of main and use mov instructions to add the arguments to the stack.
  4. In x64, no stack cleanup is happening after both calls, but at the end of main.

This brings me to my questions:

  1. Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.
  2. Why is there no stack cleanup after the call instructions in x64?
  3. What's the reason that Microsoft chose to ignore __stdcall in x64 assembly? From the docs:

    On ARM and x64 processors, __stdcall is accepted and ignored by the compiler

Here is the example code and assembly.

Ganglion answered 5/7, 2020 at 22:13 Comment(1)
A new calling convention was created for x64, and there was no good reason to intentionally create two incompatible versions, so there's only one. You can read about it here.Myself
A
8
  1. Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.

That is not the reason. Both of these instructions also exist in x86 assembly language.

The reason why your compiler is not emitting a push instruction for the x64 code is probably because it must adjust the stack pointer directly anyway, in order to create 32 bytes of "shadow space" for the called function. See this link (which was provided by @NateEldredge) for further information on "shadow space".

Allocating 32 bytes of "shadow space" with push instructions would take 4 64-bit push instructions, but only one sub instruction. That is why it prefers to use the sub instruction. Since it is using the sub instruction anyway to create 32 bytes of shadow space, there is no penalty to change the operand of the sub instruction from 32 to 72, which allocates 72 bytes of memory on the stack, which is enough to also pass 3 parameters on the stack (the other 4 are passed in CPU registers).

I don't understand why it is allocating 72 bytes on the stack, though, as, according to my calculcations, it only has to be 56 bytes (32 bytes of "shadow space" and 24 bytes for the 3 parameters that are passed on the stack). Possibly, the compiler is reserving those extra 16 bytes for local variables or for exception handling, which may be optimized away when compiler optimizations are active.


  1. Why is there no stack cleanup after the call instructions in x64?

There is stack cleanup after the call instructions. This is what the line

add rsp, 72

does.

However, for some reason (probably increased performance), the x64 compiler only performs the cleanup at the end of the calling function, instead of after every function call. This means that with the x64 compiler, all function calls share the same stack space for their parameters, whereas with the x86 compiler, the stack space is allocated and cleaned up at every function call.


  1. What's the reason that Microsoft chose to ignore __stdcall in x64 assembly?

The keywords _stdcall and _cdecl specify 32-bit calling conventions. That's why they are not relevant for 64-bit programs (i.e. x64). On x64, there is only the standard calling convention and the extended __vectorcall calling convenction.

Agueweed answered 6/7, 2020 at 1:47 Comment(6)
Yes, doing stack cleanup once on function exit is obviously better for code-size and performance than doing it between calls. It also lets the compiler clean up locals and child shadow space / stack-arg space at the same time.Gauldin
The "standard" Windows x64 calling convention is called x64 __fastcall, I think. MSVC's -Gv option makes __vectorcall the default; I haven't tested if __fastcall can override back to not passing vectors in XMM regs.Gauldin
"There is stack cleanup after the call instructions" I've seen that one at the end of the main function. I was wondering why there is no cleanup directly after each call instruction. I can see that doing one cleanup in the end is more optimal, but I wasn't expecting that with disabled optimizations.Ganglion
@PeterCordes according to this, __fastcall is another ignored convention on x64. I cannot find a name for the x64 convention tho.Ganglion
learn.microsoft.com/en-us/cpp/build/… says "... x64 uses the __fastcall calling convention ..." and "The __fastcall convention uses registers for the first four arguments and the stack frame to pass additional arguments". Also, I tested MSVC on Godbolt, and __fastcall does override the -Gv default of __vectorcall. godbolt.org/z/GFrwKM.Gauldin
I guess nobody updated Wikipedia after the creation of __vectorcall; at first there was only one x64 convention so that statement was true, but MSVC realized it sucked too much for vector funcs so they now have 2 named conversions. Naming the default x64 convention __fastcall might have been retroactive, IDK. Wikipedia might have been thinking that __fastcall meant specifically the 32-bit style convention, 2 register args and no shadow space. It's true that nothing will give you that.Gauldin

© 2022 - 2024 — McMap. All rights reserved.