Understanding stack alignment enforcement

Asked 21/11, 2017 at 10:45 Answered 21/11, 2017 at 14:22

Solved gcc assembly x86 memory-alignment abi

Consider the following C code:

#include <stdint.h>

void func(void) {
   uint32_t var = 0;
   return;
}

The unoptimized (i.e.: -O0 option) assembly code generated by GCC 4.7.2 for the code above is:

func:
    pushl %ebp
    movl %esp, %ebp
    subl $16, %esp
    movl $0, -4(%ebp)
    nop
    leave
    ret

According to the stack alignment requirements of the System V ABI, the stack must be aligned by 16 bytes before every call instruction (the stack boundary is 16 bytes by default when not changed with the option -mpreferred-stack-boundary). Therefore, the result of ESP modulo 16 has to be zero prior to a function call.

Bearing in mind these stack alignment requirements, I assume the following stack's status representation just before executing the leave instruction to be right:

Size (bytes)       Stack          ESP mod 16      Description
-----------------------------------------------------------------------------------

             |     . . .      |             
             ------------------........0          at func call
         4   | return address |
             ------------------.......12          at func entry
         4   |   saved EBP    |
     ---->   ------------------........8          EBP is pointing at this address
     |   4   |      var       |
     |       ------------------........4
 16  |       |                |
     |  12   |                |
     |       |                |
     ---->   ------------------........8          after allocating 16 bytes

With this representation of the stack in mind, there are two points that puzzle me:

var is obviously not aligned on the stack to 16 bytes. This issue seems to contradict what I have read in this answer to this question (the emphasis is of my own):

-mpreferred-stack-boundary=n where the compiler tries to keep items on the stack aligned to 2^n.

In my case -mpreferred-stack-boundary wasn't provided, so it is set by default to 4 (i.e.: 2^4=16 bytes boundary) according to this section of GCC's documentation (I got indeed the same results with -mpreferred-stack-boundary=4).
The purpose of allocating 16 bytes on the stack (i.e.: the subl $16, %esp instruction) instead of allocating just 8 bytes: after allocating 16 bytes neither the stack is aligned by 16 bytes nor any memory space is spared. By allocating just 8 bytes instead, the stack gets aligned by 16-bytes and no additional 8 bytes are wasted.

Muskogee answered 21/11, 2017 at 10:45 Comment(5)

This has very little to do with C, and very much to do with "the System V ABI" on "x86" post compilation to machine code. – Agent 21/11, 2017 at 11:4

@Sebivor are you suggesting me to edit the tags and choose abi over c? I am limited to 5 tags. – Muskogee 21/11, 2017 at 11:6

Well, as source code you've provided something so basic it could be ported to virtually any language and generate the same machine code, so... what I'm recommending is that you remove the C tag, or find some citation in n1570 which speaks of things like "stack alignment" and "System V ABI"... – Agent 21/11, 2017 at 11:10

See this: -m-preferred-stack-boundary doesn't align single variables. See this for you second point. – Interoceptor 21/11, 2017 at 11:34

Also keep in mind the C compiler is not obliged to produce optimal code in any kind of metric, including stack space usage. While it will try hard (and from playing around with gcc 4.7.2 on godbolt it looks good, the junk space is result only of the alignment), there's no language-breaking problem if it would fail and allocate 16B more junk than truly needed (especially in unoptimized code). What it obeys (due to platform specific option) is having the esp properly aligned upon next call instruction. From C language point of view even stack existence is not mandatory, nor some alignment. – Judicative 21/11, 2017 at 11:52

Looking at -O0-generated machine code is usually a futile exercise. The compiler will emit whatever works, in the simplest possible way. This often leads to bizarre artifacts.

Stack alignment only refers to alignment of the stack frame. It is not directly related to the alignment of objects on the stack. GCC will allocate on-stack objects with the required alignment. This is simpler if GCC knows that the stack frame already provides sufficient alignment, but if not, GCC will use a frame pointer and perform explicit alignment.

Burgoyne answered 21/11, 2017 at 14:22 Comment(3)

In your last sentence, do you mean the typical andl $-16, %esp to assure that the stack is properly aligned to 16 bytes? (preserving the original esp by means of ebp). – Muskogee 21/11, 2017 at 14:28

Yes, this is one way to do it. But GCC will not do this by default because it assumes the stack is already aligned, you will need to pass an option like -mrealignstack, and GCC will only do it if needed. – Burgoyne 21/11, 2017 at 14:31

Thanks, I got it. The attribute force_align_arg_pointer will also do for individual functions. – Muskogee 21/11, 2017 at 14:33

This answer aims to further develop some of the comments written above.

First, based on Margaret Bloom's comment, consider the following modification of the func() function that was originally posted:

#include <stdint.h>

void bar(void);    

void func(void) {
   uint32_t var = 0;
   bar(); // <--- function call
   return;
}

Unlike the original func() function, the redefined one contains a function call to bar().

The generated assembled code is this time:

func:
    pushl %ebp
    movl %esp, %ebp
    subl $24, %esp
    movl $0, -12(%ebp)
    call bar
    nop
    leave
    ret

Note that, the instruction subl $24, %esp does align the stack by 16 bytes (the subl $16, %esp instruction in the original func() function didn't).

Since the redefined func() contains a function call now (i.e.: call bar), the stack has to be aligned by 16 bytes just before executing the call instruction. The previous func() called no function at all, therefore there was no need for the stack to be aligned by 16 bytes.

It is clear, that, at least, 4 bytes must be allocated on the stack for the var variable. Allocating 4 additional bytes would be needed in order to align the stack by 16 bytes.

Someone may ask why 24 bytes are being allocated in order to align the stack, when allocating just 8 bytes would do. Well, by paraphrasing part of Ped7g's comment, this question is also answered:

Also keep in mind the C compiler is not obliged to produce optimal code in any kind of metric, including stack space usage. While it will try hard (and from playing around with gcc 4.7.2 on godbolt it looks good, the junk space is result only of the alignment), there's no language-breaking problem if it would fail and allocate 16B more junk than truly needed (especially in unoptimized code).

Muskogee answered 21/11, 2017 at 13:38 Comment(4)

Use volatile int var = 1; to get the compiler to still do a store with -O3. Looking at -O0 code is silly; it's not even trying to be optimal. Or without volatile, forcing the compiler to save something across a function call is another way to make it use stack space. (With register args (like in 64-bit code, or with a regparm calling convention), use a function arg after a call to a function it can't see, like you're doing here with bar()). Actually nvm, that will push/pop ebx or rbx and keep the value there. I was thinking void foo(int a) { bar(); return a+1; }, but NVM. – Adalbert 21/11, 2017 at 14:54

It is aligned to 32. Don't overlook the return address, saved ebp, stack canary. Modern C compilers favor aligning to 16 or 32 so they can generate optimal SIMD code, using SSE2 or AVX. – Oliveolivegreen 21/11, 2017 at 14:58

@HansPassant AFAIK, at the moment of performing the call the stack is aligned to 16. Then, both the return address and the ebp register are pushed on the stack. After that, 24 bytes are allocated on the stack. In total, esp is decreased by 32. However subtracting 32 from a 16-byte-aligned address does not necessarily result in an address that is aligned to 32 bytes. How do you know it is aligned to 32 bytes? – Muskogee 21/11, 2017 at 15:40

@HansPassant Of course, if it is aligned to 32 it will be also aligned to 16, since the latter requirement is weaker. – Muskogee 21/11, 2017 at 15:42

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags