Why Assembly x86_64 syscall parameters are not in alphabetical order like i386
Asked Answered
B

1

3

There is that one question that troubles me.

So ... Why in x86_32 the parameters are passed in registers that I feel are in alphabetically (eax, ecx, edx, esi) and ranked order (esi, edi, ebp)

+---------+------+------+------+------+------+------+
| syscall | arg0 | arg1 | arg2 | arg3 | arg4 | arg5 |
+---------+------+------+------+------+------+------+
|   %eax  | %ebx | %ecx | %edx | %esi | %edi | %ebp |
+---------+------+------+------+------+------+------+

section .text
    global _start
_start:
    mov eax, 1     ; x86_64 opcode for sys_exit
    mov ebx, 0     ; first argument
    int 0x80

While in x86_64 syscall's parameters are passed in registers that look a little bit randomly arranged:

+---------+------+------+------+------+------+------+
| syscall | arg0 | arg1 | arg2 | arg3 | arg4 | arg5 |
+---------+------+------+------+------+------+------+
|   %rax  | %rdi | %rsi | %rdx | %r10 | %r8  | %r9  |
+---------+------+------+------+------+------+------+

section .text
    global _start
_start:
    mov eax, 1     ; x86_64 opcode for sys_exit
    mov edi, 0     ; first argument
    syscall

Did they do that for a specific reason? Am I not seeing something here?

Buell answered 6/12, 2017 at 14:28 Comment(7)
In x86-64, it matches the function-calling convention so syscall wrapper functions are light weight. In i386, IDK why it uses that inconvenient setup (ebx is call-preserved, so almost every syscall wrapper needs to save/restore ebx.)Contingent
@PeterCordes And the function calling convention was designed to reduce the amount of register shuffling when implementing memcpy with rep movsb among other things.Ency
function-calling convention? Maybe I'll sound like a rookie, maybe because I am ... but what is a function-calling convention? Is that an Assembly thing?Buell
My answer on stackoverflow.com/questions/4429398/… has some history about how the x86-64 SysV ABI was designed (including links to mailing list archives from Jan Hubicka, who did the designing.)Contingent
@ApostolisAnastasiou: Normally you'd just say "the calling convention", but we need to distinguish between the calling convention for functions, and the calling convention for system calls. See stackoverflow.com/questions/2535989/… for info on both. (This question could maybe be a duplicate of either this link or the one in my previous comment...)Contingent
@PeterCordes So, If I understood, the reason behind the registers used is their Electronical Structure and their mechanical operation? No one is specifically referring to it, but Jan Habicka here: web.archive.org/web/20140414124645/http://www.x86-64.org/… states that other registers produced less amount of code? How can that be? Since 5 minutes before I thought all registers were the same but they were used differently because of a convention. Now all things changed for me ...Buell
I put my reply to this comment into my answer with an edit.Contingent
C
3

The x86-64 System V ABI was designed to minimize instruction-count (and to some degree code-size) in SPECint as compiled by the version of gcc that was current before the first AMD64 CPUs were sold. See this answer for some history and list-archive links.

Since 5 minutes before I thought all registers were the same but they were used differently because of a convention. Now all things changed for me

x86-64 is not fully orthogonal. Some instructions implicitly use specific registers. e.g. push implicitly uses rsp as the stack pointer, shl edx, cl is only usable with a shift count in cl (until BMI2 shlx).

More rarely used: widening mul rdi does rdx:rax = rax*rdi. The rep-string instructions implicitly use RDI, RSI, and RCX, although they're often not worth using.

It turns out that choosing the arg-passing registers so that functions that passed their args to memcpy could inline it as rep movs was useful in the metric Jan Hubicka was using, thus rdi and rsi were chosen as the first two args. But that leaving rcx unused until the 4th arg was better, because cl is needed for variable-count shift. (And most functions don't happen to use their 3rd arg as a shift count.) (Probably older GCC versions inlined memcpy or memset as rep movs more aggressively; it's usually not worth it vs. SIMD for small arrays these days.)


The x86-64 System V ABI uses almost the same calling convention for functions as it does for system calls. This is not a coincidence: it means the implementation for a libc wrapper function like mmap can be:

mmap:
    mov  r10, rcx       ; syscall destroys rcx and r11; 4th arg passed in r10 for syscalls
    mov  eax, __NR_mmap
    syscall

    cmp  rax, -4096
    ja  .set_errno_and_stuff
    ret

This is a tiny advantage, but there's really no reason not to do this. It also saves a few instructions inside the kernel setting up the arg-passing registers before dispatching to the C implementation of the system call in the kernel. (See this answer for a look at some kernel side of system call handling. Mostly about the int 0x80 handler, but I think I mentioned the 64-bit syscall handler and that it dispatches to a table of functions directly from asm.)

The syscall instruction itself destroys RCX and R11 (to save user-space RIP and RFLAGS without needing microcode to set up the kernel stack) so the conventions can't be identical unless the user-space convention avoided RCX and R11. But RCX is a handy register whose low half can be used without a REX prefix so that probably would have been worse than leaving it as a call-clobbered pure scratch like R11. Also, the user-space convention uses R10 as a "static chain" pointer for languages with first-class nested functions (not C/C++).

Having the first 4 args able to avoid a REX prefix is probably best for overall code-size, and using RBX or RBP instead of RCX would be weird. Having a couple call-preserved registers that don't need a REX prefix (EBX/EBP) is good.

See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 for the function-call and system-call conventions.


The i386 system call convention is the clunky and inconvenient one: ebx is call-preserved, so almost every syscall wrapper needs to save/restore ebx, except for calls with no args like getpid. (And for that you don't even need to enter the kernel, just call into the vDSO: see The Definitive Guide to Linux System Calls (on x86) for more about vDSO and tons of other stuff.)

But the i386 function-calling convention passes all args on the stack, so glibc wrapper functions still need to mov every arg anyway.

Also note that the "natural" order of x86 registers is EAX, ECX, EDX, EBX, according to their numeric codes in machine code, and also the order that pusha / popa use. See Why are first four x86 GPRs named in such unintuitive order?.

Contingent answered 6/12, 2017 at 14:54 Comment(2)
Any clue why the syscall convention differs at 4th arg from the userland convention (r10 vs rcx)? You make a good point about simple wrappers but they could be even faster except for this discrepancy which requires shuffling 1, 2 or 3 arguments for calls with 4, 5 or 6 arguments.Horotelic
@BeeOnRope: The syscall instruction itself literally clobbers rcx and r11 with the saved RIP and RFLAGS values. It's a very minor factor in the choice of calling convention, so considerations like REX-needed high regs vs. no-REX registers like ecx outweighed it. And if the convention had used rbx instead of rcx as an arg-passing/call-clobbered "low" reg, variable shifts would need to safe/restore it. You also want some call-preserved low regs (RBP, and RBX because it's the "least special" low register, mainly CPUID / CMPXCHG16B (which didn't even exist originally))Contingent

© 2022 - 2024 — McMap. All rights reserved.