Why is RAX not used to pass a parameter in System V AMD64 ABI?

x86-64 System V does use AL for variadic functions: the caller passes the number of FP args in XMM registers.

(This is only an optimization to allow the callee to not dump all the vector regs into an array; the number in AL is allowed to be higher than the number of FP args. In practice, gcc's code-gen for variadic functions just checks if it's non-zero and dumps either none or all 8 of xmm0..7. I think the ABI guarantees that it's safe to always pass al=8 even if there aren't actually any FP args, and that you can't pass pass FP args on the stack instead by setting al=0)

But why not use r9b for that, and use RAX for the 6th arg? Or RAX for some earlier arg?

Because RAX has so many implicit uses in x86, and experiments when designing the calling convention (http://web.archive.org/web/20140414124645/http://www.x86-64.org/pipermail/discuss/2000-November/001257.html) found that using RAX tended to require extra instructions in the caller or callee. e.g. because RAX was often needed as part of computing other args in the caller, or was needed while doing something with one of the other args before the code gets around to using the arg that was passed in RAX.

RAX is used for rep stos (which gcc used to use more aggressively to inline memset), and it's used for div and widening (one-operand) mul/imul, which gcc uses for division by a compile-time constant. (Why does GCC use multiplication by a strange number in implementing integer division?).

Most of the other RAX special uses are just shorter encodings of things you can also do with other registers, like cdqe vs. movsxd rax, eax (or between any other registers). Or add eax,imm32 (no ModRM) vs. add r/m32, imm32 (or most other ALU instructions). See one of my answers on Tips for golfing in x86/x64 machine code. Original 8086 lacked many of the longer non-AX alternatives, but between 8086 and 386, stuff like imul r32,r32 and movsx/movzx were added. Other RAX-only instructions aren't worth using when optimizing for speed (like xlatb, lodsd), or are obsolete by P6 / AMD64 extensions (lahf as part of FP compares obsoleted by fucomi and using SSE/SSE2 ucomisd for FP math), or are specialized instructions like cmpxchg or cpuid that are too rare to have an impact on calling convention design. Compilers didn't use the BCD instructions like aaa anyway, and AMD64 removed them.

The designers of the x86-64 System V calling convention (primarily Jan Hubička for the integer arg-passing register design) generally aimed to avoid registers with many / common implicit uses. rdx comes before rcx in the arg-passing order, because cl is needed for variable shift counts (without BMI2). These are maybe more common than mul and div, because 2-operand imul reg,reg allows normal non-widening multiplies without clobbering RDX:RAX.

The choice of rdi and rsi as the first 2 args was apparently motivated by inlining memset or memcpy as rep movs (which gcc did back in 2000, even though it wasn't actually a good choice in many of the cases where gcc did that). Even though rep-string instructions use RCX as the counter, they still found it on average saved instructions to pass the 3rd arg in RDX instead of RCX, so the calling convention doesn't quite work out for memcpy to be rep stosb/ret.

Jan Hubička evaluated multiple variations on arg-passing registers by compiling SpecInt with a then-current version of x86-64 gcc. See my answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? for some more details and links.

One of the arg-register orders he evaluated was RAX, RDX, RCX, RBX, RSI, RDI, but he found that less good than other options. (See the mailing list message linked above).

It's fairly common for RISC calling conventions to pass the first arg in the first return-value register. ARM does this (r0), and I think so does PowerPC. Others (like MIPS) don't. But all of those architectures have no implicit uses of most integer registers, often just a link register and maybe the stack pointer.

x86-64 SysV and Windows do this for FP args: xmm0 for passing and returning.

Recommended topics

Hot tags