Stack cleanup in stdcall (callee-pops) for variable arguments
Asked Answered
B

1

3

I'm learning a bit of assembly for fun (currently using NASM on Windows), and I have a question regarding the stdcall calling convention and functions with variable numbers of arguments. For example, a sum function that takes X integers and adds them all together.

Since the callee needs to clean/reset the stack when using stdcall, but you can only use constant values with ret, I've been wondering if there's anything wrong with popping the return address, moving esp, and jumping back to the caller yourself, instead of using ret. I assume this would be slower, since it requires more instructions, but would it be acceptable?

; int sum(count, ...)
sum:
    mov ecx, [esp+4] ; count
    
    ; calc args size
    mov eax, ecx ; vars count
    inc eax      ; + count
    mov edx, 4   ; * 4 byte per var
    mul edx
    mov edx, eax
    
    xor eax, eax ; result
    
    cmp ecx, 0   ; if count == 0
    je .done
    inc ecx      ; count++, to start with last arg
    
    .add:
        add eax, [esp+4*ecx]
        dec ecx  ; if --ecx != 1, 0 = return, 1 = count
        cmp ecx, 1
        jnz .add
    .done:
        pop ebx
        add esp,edx
        jmp ebx

I don't see why this wouldn't be okay, and it appears to work, but I've read articles that talked about how stdcall can't handle variable arguments, because the function can't know what value to pass to ret. Am I missing something?

Bresee answered 23/1, 2021 at 20:10 Comment(6)
Variable arguments switch to caller cleanup even under stdcall.Allround
I think on Windows, variadic functions are never stdcall. If most library functions are stdcall or fastcall (callee-pops for the 32-bit version of fastcall), functions like printf are cdecl (caller-pops)Incubator
I see. well, let's not get hung up on me calling it "stdcall" then, but rather focus on whether there's anything wrong with my very own calling convention, where I clean the stack even on a function with variable arguments^^Bresee
Other than performance and needing a scratch register, yes you can pop to a register, adjust (r/e)sp, then jump to that register's content. Performance may suffer also because you may confuse the function call handling of a branch predictor. The register needs to be available obviously because you cannot restore it later. Note that add or sub to the stack pointer would modify the status flags; use lea or pop to avoid that.Earthenware
@ecm: Fun fact: branch prediction would be fine if you return with push reg / ret instead of jmp ebx (because you still return to the same place. Return-address prediction works by assuming call/ret nest correctly (an internal stack-like hardware data structure of return addresses), nothing to do with where ESP points).Incubator
Related: What calling convention does printf() in C use? covers why we don't do this, because it's inefficient even if you had the right number of bytes.Incubator
H
4

Of course ret imm works if the size of the arguments is a constant. Your idea would work if the function is able to determine the size of its arguments at runtime, which in this case it does from the count argument, though as ecm points out it may be inefficient because the indirect branch predictor isn't designed for such shenanigans.

But in some cases, the size of the arguments may not be known to the called function at all, not even at runtime. Consider printf. You might say it could deduce the size of its arguments from the format string; for instance, if the format string was "%d" then it should know that one int was passed and therefore clean up an extra 4 bytes from the stack. But it is perfectly legal under the C standard to call

printf("%d", 123, 456, 789, 2222);

The excess arguments are required to be ignored. But under your calling convention, printf would think it only had to clean up 4 bytes from the stack (plus its non-variadic format string argument), whereas its caller would expect it to clean up 16, and the program will crash.

So unless your calling convention is going to include a "hidden" argument that tells the called function how many bytes of arguments to clean up, it can't work. And passing such an extra argument is going to require more instructions than having the caller just do the stack cleanup itself.

Hercules answered 23/1, 2021 at 23:5 Comment(1)
Just for the record: Is calling printf with excess arguments undefined behaviour? confirms you're right: well defined. However, a mismatch between a conversion and an arg it does reference is UB, (Why are arguments which do not match the conversion specifier in printf undefined behavior?), but that's a separate issue. That being UB allows stuff like x86-64 SysV passing FP and integer args in separate registers in terms of 3rd FP arg in xmm2, rather than 3rd overall arg in xmm2 if it's FP (like Windows x64 does)Incubator

© 2022 - 2024 — McMap. All rights reserved.