instead of put the first 6 arguments in registers just to move them onto the stack in the function prologue?
I was looking at some code that gcc generated and that's what it always did.
Then you forgot to enable optimization. gcc -O0
spills everything to memory so you can modify them with a debugger while single-stepping. That's obviously horrible for performance, so compilers don't do that unless you force them to by compiling with -O0
.
x86-64 System V allows int add(int x, int y) { return x+y; }
to compile to
lea eax, [rdi + rsi]
/ ret
, which is what compilers actually do as you can see on the Godbolt compiler explorer.
Stack-args calling conventions are slow and obsolete. RISC machines have been using register-args calling conventions since before x86-64 existed, and on OSes that still care about 32-bit x86 (i.e. Windows), there are better calling conventions like __vectorcall
that pass the first 2 integer args in registers.
i386 System V hasn't been replaced because people mostly don't care as much about 32-bit performance on other OSes; we just use 64-bit code with the nicely-designed x86-64 System V calling convention.
For more about the tradeoff between register args and call-preserved vs. call-clobbered registers in calling convention design, see Why not store function parameters in XMM vector registers?, and also Why does Windows64 use a different calling convention from all other OSes on x86-64?.
-O3
– Candycandyce