I'm reading "Computer Systems: A Programmer's Perspective, 3/E" (CS:APP3e) and the following code is an example from the book:
long call_proc() {
long x1 = 1;
int x2 = 2;
short x3 = 3;
char x4 = 4;
proc(x1, &x1, x2, &x2, x3, &x3, x4, &x4);
return (x1+x2)*(x3-x4);
}
The book gives the assembly code generated by GCC:
long call_proc()
call_proc:
; Set up arguments to proc
subq $32, %rsp ; Allocate 32-byte stack frame
movq $1, 24(%rsp) ; Store 1 in &x1
movl $2, 20(%rsp) ; Store 2 in &x2
movw $3, 18(%rsp) ; Store 3 in &x3
movb $4, 17(%rsp) ; Store 4 in &x4
leaq 17(%rsp), %rax ; Create &x4
movq %rax, 8(%rsp) ; Store &x4 as argument 8
movl $4, (%rsp) ; Store 4 as argument 7
leaq 18(%rsp), %r9 ; Pass &x3 as argument 6
movl $3, %r8d ; Pass 3 as argument 5
leaq 20(%rsp), %rcx ; Pass &x2 as argument 4
movl $2, %edx ; Pass 2 as argument 3
leaq 24(%rsp), %rsi ; Pass &x1 as argument 2
movl $1, %edi ; Pass 1 as argument 1
; Call proc
call proc
; Retrieve changes to memory
movslq 20(%rsp), %rdx ; Get x2 and convert to long
addq 24(%rsp), %rdx ; Compute x1+x2
movswl 18(%rsp), %eax ; Get x3 and convert to int
movsbl 17(%rsp), %ecx ; Get x4 and convert to int
subl %ecx, %eax ; Compute x3-x4
cltq ; Convert to long
imulq %rdx, %rax ; Compute (x1+x2) * (x3-x4)
addq $32, %rsp ; Deallocate stack frame
ret ; Return
I can understand this code: the compiler allocates 32 bytes of space on the stack, of which the first 16 bytes hold the arguments passed to proc
and the last 16 bytes hold 4 local variables.
Then I tested this code on GCC 11.2, using the optimization flag -Og
, and got this assembly code:
call_proc():
subq $24, %rsp
movq $1, 8(%rsp)
movl $2, 4(%rsp)
movw $3, 2(%rsp)
movb $4, 1(%rsp)
leaq 1(%rsp), %rax
pushq %rax
pushq $4
leaq 18(%rsp), %r9
movl $3, %r8d
leaq 20(%rsp), %rcx
movl $2, %edx
leaq 24(%rsp), %rsi
movl $1, %edi
call proc(long, long*, int, int*, short, short*, char, char*)
movslq 20(%rsp), %rax
addq 24(%rsp), %rax
movswl 18(%rsp), %edx
movsbl 17(%rsp), %ecx
subl %ecx, %edx
movslq %edx, %rdx
imulq %rdx, %rax
addq $40, %rsp
ret
I noticed that gcc first allocated 24 bytes for 4 local variables. Then it uses pushq
to add 2 arguments to the stack, so the final code uses addq $40, %rsp
to free stack space.
Compared to the code in the book, GCC allocates 8 more bytes of space here, and it doesn't seem to use the extra space. Why does it need the extra space?
pushq %rbx; ...; addq $32, %rsp; pop %rbx
instead... – Truculentcall
. Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment? If it's from CS:APP global edition, it's because some incompetent people the publisher hired wrote bad practice problems, including falsely claiming their crap was actual compiler output when it wouldn't even assemble. The other portions of the book are still good, though. – Conceivablepush
was the default in older GCC. In modern GCC, you can enable it manually with-maccumulate-outgoing-args
: godbolt.org/z/zbqs1MchE shows GCC just using onesub $40, %rsp
.) – Conceivable