Allocating an extra 16 bytes of stack space is a GCC missed optimization that pops up occasionally. I don't know why it happens, but it's reproducible with GCC10.1 -O3
. Clang doesn't do it, it just reserves 8 bytes (with a dummy push
). Example on Godbolt, where -fno-stack-protector -fno-pie
is the default, unlike GCC in many GNU/Linux distros.
Even int buf;
/ foo(&buf)
results in over-allocation.
My wild guess is that there's something GCC doesn't optimize away until after it's already decided it needs more than 8 bytes of space (and thus needs 24). Hopefully this good MCVE will let GCC devs find an fix that bug, if it's easily fixable.
Feel free to report this as a GCC missed-optimization
bug (https://gcc.gnu.org/bugzilla/); I looked recently but didn't find an existing one.
You're correct that allocating 8 bytes would be enough for char buf[8]
and re-align RSP by 16 before the call
, as required by the x86-64 System V ABI (Why does System V / AMD64 ABI mandate a 16 byte stack alignment?).
GCC is not trying to maintain 32-byte stack alignment or anything. The default for -mpreferred-stack-boundary
is the minimum allowed by the ABI, 4
(2^4 = 16).
alignas(32) buf[8]
, you'd see extra code to over-align the stack. – Lou