Is the Microsoft Stack always aligned to 16-bytes?

In Assembly Language, Seventh Edition for x86 Processors by Kip Irvine, on page 211, it says under 5.53 The x86 Calling Convention which addresses the Microsoft x64 Calling Convention,

When calling a subroutine, the stack pointer (RSP) must be aligned on a 16-byte boundary (a multiple of 16). The CALL instruction pushes an 8-byte return address on the stack, so the calling program must subtract 8 from the stack pointer, in addition to the 32 it already subtracts for the shadow space.

It goes on to show some assembly with a sub rsp, 8 right before the sub rsp, 20h (for the 32-bytes of shadow space).

Is this a safe convention though? Is the Microsoft stack guaranteed to be aligned on 16-bytes before the CALL instruction? Or, is the book wrong in assuming that the stack was

aligned to 16-bytes prior to the CALL
had an 8-byte return addresses push onto the stack with the CALL
requires an additional sub rsp, 8; to get back to 16-byte alignment?

I'm asking about meeting the requirements of the x64 ABI. Is it safe to blindly adjust the stack by growing it 8 bytes for a 16-byte alignment after every call.

Yes, that's the whole point of the ABI requiring / guaranteeing 16-byte alignment before a call.

You can do whatever you want inside a function, for example 3x 16-bit pushes and then sub rsp, (24 - 3*2) to regain 16-byte stack alignment after entry to a function.

Or movq xmm0, rsp and then use rsp as an extra scratch register to get 16 total integer regs, until you restore it before making another call or ret.¹

There's no requirement that RSP be 16-byte aligned after every instruction, only at function call boundaries. This is the why they're called "calling conventions", not "coding standards".

This is a similar concept to rbx being call-preserved. It doesn't matter if you save/restore it on the stack, in xmm0, in static storage, if you negate it and then negate it back again, or if you don't touch it at all. All that matters is that it has the same value when you return to the caller as it did when your function was called.

Footnote 1: Works as long as you don't have any async callbacks / SEH handlers that could possibly run on the user-space stack. This is not really guaranteed to be safe, but may work as a hack.

Is it valid to write below ESP? is related: as Ped7g points out, if something can asynchronously use space below the stack pointer, it will probably break if RSP isn't pointing to stack memory at all.

I've seen a 32-bit example avisynth video filter (I think) that used this to get 8 tmp regs (when no MMX was available), with big warning comments in the code to debug first before using this trick.

Recommended topics

Hot tags