I don't understand what the benefit of not passing a parameter in RAX, Since the return value is in RAX it is going to be clobbered by the callee anyway.
Can someone explain?
I don't understand what the benefit of not passing a parameter in RAX, Since the return value is in RAX it is going to be clobbered by the callee anyway.
Can someone explain?
x86-64 System V does use AL for variadic functions: the caller passes the number of FP args in XMM registers.
(This is only an optimization to allow the callee to not dump all the vector regs into an array; the number in AL is allowed to be higher than the number of FP args. In practice, gcc's code-gen for variadic functions just checks if it's non-zero and dumps either none or all 8 of xmm0..7. I think the ABI guarantees that it's safe to always pass al=8
even if there aren't actually any FP args, and that you can't pass pass FP args on the stack instead by setting al=0
)
But why not use r9b
for that, and use RAX for the 6th arg? Or RAX for some earlier arg?
Because RAX has so many implicit uses in x86, and experiments when designing the calling convention (http://web.archive.org/web/20140414124645/http://www.x86-64.org/pipermail/discuss/2000-November/001257.html) found that using RAX tended to require extra instructions in the caller or callee. e.g. because RAX was often needed as part of computing other args in the caller, or was needed while doing something with one of the other args before the code gets around to using the arg that was passed in RAX.
RAX is used for rep stos
(which gcc used to use more aggressively to inline memset), and it's used for div
and widening (one-operand) mul
/imul
, which gcc uses for division by a compile-time constant. (Why does GCC use multiplication by a strange number in implementing integer division?).
Most of the other RAX special uses are just shorter encodings of things you can also do with other registers, like cdqe
vs. movsxd rax, eax
(or between any other registers). Or add eax,imm32
(no ModRM) vs. add r/m32, imm32
(or most other ALU instructions). See one of my answers on
Tips for golfing in x86/x64 machine code. Original 8086 lacked many of the longer non-AX alternatives, but between 8086 and 386, stuff like imul r32,r32
and movsx
/movzx
were added. Other RAX-only instructions aren't worth using when optimizing for speed (like xlatb
, lodsd
), or are obsolete by P6 / AMD64 extensions (lahf
as part of FP compares obsoleted by fucomi
and using SSE/SSE2 ucomisd
for FP math), or are specialized instructions like cmpxchg
or cpuid
that are too rare to have an impact on calling convention design. Compilers didn't use the BCD instructions like aaa
anyway, and AMD64 removed them.
The designers of the x86-64 System V calling convention (primarily Jan Hubička for the integer arg-passing register design) generally aimed to avoid registers with many / common implicit uses. rdx
comes before rcx
in the arg-passing order, because cl
is needed for variable shift counts (without BMI2). These are maybe more common than mul
and div
, because 2-operand imul reg,reg
allows normal non-widening multiplies without clobbering RDX:RAX.
The choice of rdi
and rsi
as the first 2 args was apparently motivated by inlining memset
or memcpy
as rep movs
(which gcc did back in 2000, even though it wasn't actually a good choice in many of the cases where gcc did that). Even though rep
-string instructions use RCX as the counter, they still found it on average saved instructions to pass the 3rd arg in RDX instead of RCX, so the calling convention doesn't quite work out for memcpy
to be rep stosb
/ret
.
Jan Hubička evaluated multiple variations on arg-passing registers by compiling SpecInt with a then-current version of x86-64 gcc. See my answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? for some more details and links.
One of the arg-register orders he evaluated was RAX, RDX, RCX, RBX, RSI, RDI
, but he found that less good than other options. (See the mailing list message linked above).
It's fairly common for RISC calling conventions to pass the first arg in the first return-value register. ARM does this (r0
), and I think so does PowerPC. Others (like MIPS) don't. But all of those architectures have no implicit uses of most integer registers, often just a link register and maybe the stack pointer.
x86-64 SysV and Windows do this for FP args: xmm0 for passing and returning.
blendvps/pd
/ pblendvb
, and those are usually used in SIMD loops, not in simple functions that take a few args, compute something with them, and return. –
Rodenhouse double clamp(double v, double lo, double hi)
, with x86-64-v1 the compiler will emit minsd
, maxsd
, and movapd
because v is in xmm0
and the return value needs to be xmm0
. If v is passed in xmm1
then the movapd
can be omitted. In this case, the fact that 1). v
may need to be preserved, 2), v
is passed through xmm0
and 3), the return value is placed in xmm0
means an extra movapd
is emitted. See godbolt.org/z/rq9dsGxh5 –
Hyland sqrtsd
which has a false output dependency due to Intel's bad design, or sqrtpd
which doesn't). So the best bet for some functions to avoid a movaps
is for the return-value register to be one of the inputs. Certainly that ends up being less convenient for some functions, but as godbolt.org/z/cGen5556T shows, adding a double dummy
first arg means all of your functions need one movaps
somewhere (where GCC wastes a byte of code size on movapd
.) –
Rodenhouse © 2022 - 2024 — McMap. All rights reserved.
eax
in their 32-bit convention en.wikipedia.org/wiki/X86_calling_conventions#Borland_register – Seroka