I was confused in the same case on FreeBSD 9.0/amd64. What I did is (I used nasm for assembler):
$ cat foo.asm
global _start
_start:
mov rax, 4 ; write
mov rdi, 1 ; stdout
mov rsi, rsp ; address
mov rdx, 16 ; 16bytes
syscall
mov rax, 1 ; exit
syscall
$ nasm -f elf64 foo.asm && ld -o foo foo.o
$ ./foo | hd
00000000 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000010
$ ./foo 2 | hd
00000000 02 00 00 00 00 00 00 00 b8 dc ff ff ff 7f 00 00 |................|
00000010
$ ./foo 2 3 | hd
00000000 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 |................|
00000010
$ ./foo 2 3 4 | hd
00000000 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00000010
$ ./foo 2 3 4 5 | hd
00000000 05 00 00 00 00 00 00 00 b0 dc ff ff ff 7f 00 00 |................|
00000010
I expected that argc was at rsp, but it was not.
I guessed that the kernel (the image activator) sets registers. I searched the source tree, I found the following code in /usr/src/sys/amd64/amd64/machdep.c (exec_setregs).
regs->tf_rsp = ((stack - 8) & ~0xFul) + 8;
regs->tf_rdi = stack; /* argv */
EDIT: machdep.c has been split up and this function can now be found in exec_machdep.c
These lines look saying that rsp is aligned, actual data are at rdi. I changed my code, and I got expected results.
$ cat foo.asm
global _start
_start:
push rdi
mov rax, 4 ; write
mov rdi, 1 ; stdout
pop rsi
mov rdx, 16 ; 16bytes
syscall
mov rax, 1 ; exit
syscall
$ nasm -f elf64 foo.asm && ld -o foo foo.o
$ ./foo | hd
00000000 01 00 00 00 00 00 00 00 b0 dc ff ff ff 7f 00 00 |................|
00000010
$ ./foo 2 | hd
00000000 02 00 00 00 00 00 00 00 a8 dc ff ff ff 7f 00 00 |................|
00000010
$ ./foo 2 3 | hd
00000000 03 00 00 00 00 00 00 00 a8 dc ff ff ff 7f 00 00 |................|
00000010
$ ./foo 2 3 4 | hd
00000000 04 00 00 00 00 00 00 00 a8 dc ff ff ff 7f 00 00 |................|
00000010
$ ./foo 2 3 4 5 | hd
00000000 05 00 00 00 00 00 00 00 a8 dc ff ff ff 7f 00 00 |................|
00000010
Can you try rdi?