Why does GCC allocate more stack memory than needed?

long call_proc() call_proc: ; Set up arguments to proc subq $32, %rsp ; Allocate 32-byte stack frame movq $1, 24(%rsp) ; Store 1 in &x1 movl $2, 20(%rsp) ; Store 2 in &x2 movw $3, 18(%rsp) ; Store 3 in &x3 movb $4, 17(%rsp) ; Store 4 in &x4 leaq 17(%rsp), %rax ; Create &x4 movq %rax, 8(%rsp) ; Store &x4 as argument 8 movl $4, (%rsp) ; Store 4 as argument 7 leaq 18(%rsp), %r9 ; Pass &x3 as argument 6 movl $3, %r8d ; Pass 3 as argument 5 leaq 20(%rsp), %rcx ; Pass &x2 as argument 4 movl $2, %edx ; Pass 2 as argument 3 leaq 24(%rsp), %rsi ; Pass &x1 as argument 2 movl $1, %edi ; Pass 1 as argument 1 ; Call proc call proc ; Retrieve changes to memory movslq 20(%rsp), %rdx ; Get x2 and convert to long addq 24(%rsp), %rdx ; Compute x1+x2 movswl 18(%rsp), %eax ; Get x3 and convert to int movsbl 17(%rsp), %ecx ; Get x4 and convert to int subl %ecx, %eax ; Compute x3-x4 cltq ; Convert to long imulq %rdx, %rax ; Compute (x1+x2) * (x3-x4) addq $32, %rsp ; Deallocate stack frame ret ; Return

call_proc(): subq $24, %rsp movq $1, 8(%rsp) movl $2, 4(%rsp) movw $3, 2(%rsp) movb $4, 1(%rsp) leaq 1(%rsp), %rax pushq %rax pushq $4 leaq 18(%rsp), %r9 movl $3, %r8d leaq 20(%rsp), %rcx movl $2, %edx leaq 24(%rsp), %rsi movl $1, %edi call proc(long, long*, int, int*, short, short*, char, char*) movslq 20(%rsp), %rax addq 24(%rsp), %rax movswl 18(%rsp), %edx movsbl 17(%rsp), %ecx subl %ecx, %edx movslq %edx, %rdx imulq %rdx, %rax addq $40, %rsp ret

(This answer is a summary of comments posted above by Antti Haapala, klutt and Peter Cordes.)

GCC allocates more space than "necessary" in order to ensure that the stack is properly aligned for the call to proc: the stack pointer must be adjusted by a multiple of 16, plus 8 (i.e. by an odd multiple of 8). Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?

What's strange is that the code in the book doesn't do that; the code as shown would violate the ABI and, if proc actually relies on proper stack alignment (e.g. using aligned SSE2 instructions), it may crash.

So it appears that either the code in the book was incorrectly copied from compiler output, or else the authors of the book are using some unusual compiler flags which alter the ABI.

Modern GCC 11.2 emits nearly identical asm (Godbolt) using -Og -mpreferred-stack-boundary=3 -maccumulate-outgoing-args, the former of which changes the ABI to maintain only 2^3 byte stack alignment, down from the default 2^4. (Code compiled this way can't safely call anything compiled normally, even standard library functions.) -maccumulate-outgoing-args used to be the default in older GCC, but modern CPUs have a "stack engine" that makes push/pop single-uop so that option isn't the default anymore; push for stack args saves a bit of code size.

One difference from the book's asm is a movl $0, %eax before the call, because there's no prototype so the caller has to assume it might be variadic and pass AL = the number of FP args in XMM registers. (A prototype that matches the args passed would prevent that.) The other instructions are all the same, and in the same order as whatever older GCC version the book used, except for choice of registers after call proc returns: it ends up using movslq %edx, %rdx instead of cltq (sign-extend with RAX).

CS:APP 3e global edition is notorious for errors in practice problems introduced by the publisher (not the authors), but apparently this code is present in the North American edition, too. So this may be the author's mistake / choice to use actual compiler output with weird options. Unlike some of the bad global edition practice problems, this code could have come unmodified from some GCC version, but only with non-standard options.

Related: Why does GCC allocate more space than necessary on the stack, beyond what's needed for alignment? - GCC has a missed-optimization bug where it sometimes reserves an additional 16 bytes that it truly didn't need to. That's not what's happening here, though.

Recommended topics

Hot tags