Consider the following C code:
#include <stdint.h>
void func(void) {
uint32_t var = 0;
return;
}
The unoptimized (i.e.: -O0
option) assembly code generated by GCC 4.7.2 for the code above is:
func:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $0, -4(%ebp)
nop
leave
ret
According to the stack alignment requirements of the System V ABI, the stack must be aligned by 16 bytes before every call
instruction (the stack boundary is 16 bytes by default when not changed with the option -mpreferred-stack-boundary
). Therefore, the result of ESP
modulo 16 has to be zero prior to a function call.
Bearing in mind these stack alignment requirements, I assume the following stack's status representation just before executing the leave
instruction to be right:
Size (bytes) Stack ESP mod 16 Description
-----------------------------------------------------------------------------------
| . . . |
------------------........0 at func call
4 | return address |
------------------.......12 at func entry
4 | saved EBP |
----> ------------------........8 EBP is pointing at this address
| 4 | var |
| ------------------........4
16 | | |
| 12 | |
| | |
----> ------------------........8 after allocating 16 bytes
With this representation of the stack in mind, there are two points that puzzle me:
var
is obviously not aligned on the stack to 16 bytes. This issue seems to contradict what I have read in this answer to this question (the emphasis is of my own):-mpreferred-stack-boundary=n
where the compiler tries to keep items on the stack aligned to 2^n
.In my case
-mpreferred-stack-boundary
wasn't provided, so it is set by default to 4 (i.e.: 2^4=16 bytes boundary) according to this section of GCC's documentation (I got indeed the same results with-mpreferred-stack-boundary=4
).The purpose of allocating 16 bytes on the stack (i.e.: the
subl $16, %esp
instruction) instead of allocating just 8 bytes: after allocating 16 bytes neither the stack is aligned by 16 bytes nor any memory space is spared. By allocating just 8 bytes instead, the stack gets aligned by 16-bytes and no additional 8 bytes are wasted.
-m-preferred-stack-boundary
doesn't align single variables. See this for you second point. – Interoceptoresp
properly aligned upon nextcall
instruction. From C language point of view even stack existence is not mandatory, nor some alignment. – Judicative