Why is RAX not used to pass a parameter in System V AMD64 ABI?
Asked Answered
W

1

6

I don't understand what the benefit of not passing a parameter in RAX, Since the return value is in RAX it is going to be clobbered by the callee anyway.

Can someone explain?

Welldisposed answered 9/10, 2018 at 6:13 Comment(6)
@Someprogrammerdude: I don't think that was the question.Shote
AFAIK, RAX is used for varargs (if the function is varargs, of course) to indicate the number of arguments passed.Shote
IIRC, a specific study was done, involving profiling actual code.Patrilocal
FWIW, not really a duplicate (question mainly about Win64), but still answered here: https://mcmap.net/q/14674/-why-does-windows64-use-a-different-calling-convention-from-all-other-oses-on-x86-64 . It also discusses the choices for System V 64 bit ABI.Shote
From the same question: https://mcmap.net/q/14674/-why-does-windows64-use-a-different-calling-convention-from-all-other-oses-on-x86-64Shote
Borland used eax in their 32-bit convention en.wikipedia.org/wiki/X86_calling_conventions#Borland_registerSeroka
R
7

x86-64 System V does use AL for variadic functions: the caller passes the number of FP args in XMM registers.

(This is only an optimization to allow the callee to not dump all the vector regs into an array; the number in AL is allowed to be higher than the number of FP args. In practice, gcc's code-gen for variadic functions just checks if it's non-zero and dumps either none or all 8 of xmm0..7. I think the ABI guarantees that it's safe to always pass al=8 even if there aren't actually any FP args, and that you can't pass pass FP args on the stack instead by setting al=0)


But why not use r9b for that, and use RAX for the 6th arg? Or RAX for some earlier arg?

Because RAX has so many implicit uses in x86, and experiments when designing the calling convention (http://web.archive.org/web/20140414124645/http://www.x86-64.org/pipermail/discuss/2000-November/001257.html) found that using RAX tended to require extra instructions in the caller or callee. e.g. because RAX was often needed as part of computing other args in the caller, or was needed while doing something with one of the other args before the code gets around to using the arg that was passed in RAX.

RAX is used for rep stos (which gcc used to use more aggressively to inline memset), and it's used for div and widening (one-operand) mul/imul, which gcc uses for division by a compile-time constant. (Why does GCC use multiplication by a strange number in implementing integer division?).

Most of the other RAX special uses are just shorter encodings of things you can also do with other registers, like cdqe vs. movsxd rax, eax (or between any other registers). Or add eax,imm32 (no ModRM) vs. add r/m32, imm32 (or most other ALU instructions). See one of my answers on Tips for golfing in x86/x64 machine code. Original 8086 lacked many of the longer non-AX alternatives, but between 8086 and 386, stuff like imul r32,r32 and movsx/movzx were added. Other RAX-only instructions aren't worth using when optimizing for speed (like xlatb, lodsd), or are obsolete by P6 / AMD64 extensions (lahf as part of FP compares obsoleted by fucomi and using SSE/SSE2 ucomisd for FP math), or are specialized instructions like cmpxchg or cpuid that are too rare to have an impact on calling convention design. Compilers didn't use the BCD instructions like aaa anyway, and AMD64 removed them.


The designers of the x86-64 System V calling convention (primarily Jan Hubička for the integer arg-passing register design) generally aimed to avoid registers with many / common implicit uses. rdx comes before rcx in the arg-passing order, because cl is needed for variable shift counts (without BMI2). These are maybe more common than mul and div, because 2-operand imul reg,reg allows normal non-widening multiplies without clobbering RDX:RAX.

The choice of rdi and rsi as the first 2 args was apparently motivated by inlining memset or memcpy as rep movs (which gcc did back in 2000, even though it wasn't actually a good choice in many of the cases where gcc did that). Even though rep-string instructions use RCX as the counter, they still found it on average saved instructions to pass the 3rd arg in RDX instead of RCX, so the calling convention doesn't quite work out for memcpy to be rep stosb/ret.

Jan Hubička evaluated multiple variations on arg-passing registers by compiling SpecInt with a then-current version of x86-64 gcc. See my answer on Why does Windows64 use a different calling convention from all other OSes on x86-64? for some more details and links.

One of the arg-register orders he evaluated was RAX, RDX, RCX, RBX, RSI, RDI, but he found that less good than other options. (See the mailing list message linked above).


It's fairly common for RISC calling conventions to pass the first arg in the first return-value register. ARM does this (r0), and I think so does PowerPC. Others (like MIPS) don't. But all of those architectures have no implicit uses of most integer registers, often just a link register and maybe the stack pointer.

x86-64 SysV and Windows do this for FP args: xmm0 for passing and returning.

Rodenhouse answered 9/10, 2018 at 14:56 Comment(6)
I found this: web.archive.org/web/20140414124645/http://www.x86-64.org/… I still find it a bit surprising that using rax increases the code size. but I guess some arguments are long lived and you if you use rax you are force to spill it before using any instruction that clobbers it.Welldisposed
@IlyaLesokhin: yeah exactly. Passing too many args in registers is bad because sometimes the first thing a callee does is pass one of them to a non-inline function call, so all the rest have to be spilled. Or copy them to call-preserved regs. Things are similar if the first thing you want to do is divide one of the args that wasn't passed in RAX.Rodenhouse
By the same logic, it is surely a bad idea to use xmm0 for both passing in arguments and for returning values?Hyland
@1f604: By which logic? The only instructions that need XMM0 specifically are blendvps/pd / pblendvb, and those are usually used in SIMD loops, not in simple functions that take a few args, compute something with them, and return.Rodenhouse
I was thinking more that using xmm0 as both param0 and return value may lead to extra instruction emitted. If you're implementing a function like double clamp(double v, double lo, double hi), with x86-64-v1 the compiler will emit minsd, maxsd, and movapd because v is in xmm0 and the return value needs to be xmm0. If v is passed in xmm1 then the movapd can be omitted. In this case, the fact that 1). v may need to be preserved, 2), v is passed through xmm0 and 3), the return value is placed in xmm0 means an extra movapd is emitted. See godbolt.org/z/rq9dsGxh5Hyland
@1f604: Only a few SSE instructions can produce an output in a register that wasn't one of their inputs. (Like sqrtsd which has a false output dependency due to Intel's bad design, or sqrtpd which doesn't). So the best bet for some functions to avoid a movaps is for the return-value register to be one of the inputs. Certainly that ends up being less convenient for some functions, but as godbolt.org/z/cGen5556T shows, adding a double dummy first arg means all of your functions need one movaps somewhere (where GCC wastes a byte of code size on movapd.)Rodenhouse

© 2022 - 2024 — McMap. All rights reserved.