Why Is GCC Using Mov Instead Of Push In Function Calls?

Asked 8/3, 2014 at 9:52 Answered 26/8, 2022 at 22:44

Solved c assembly gcc x86-64 stack-memory

So I've got this example C program.

int worship(long john)
{
    return 0 * john;
}

int main()
{
    return worship(666);
}

The assembly looks (essentially) like this:

worship(long):
    pushq   %rbp
    movq    %rsp, %rbp
    movq    %rdi, -8(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $666, %edi
    call    worship(long)
    popq    %rbp
    ret

I ran into this while reading about stack smashing. In the assembly worship(long): section where it says movq %rdi, -8(%rbp) I would expect it to be using pushq based on everything I've read so far. Is this the new way that GCC is pushing arguments onto the stack and if so is there a compiler flag I could be using to toggle this?

Ajar answered 8/3, 2014 at 9:52 Comment(11)

oh, maybe that's an optimization. as far as i know movq doesn't change the value of stack pointer – Jenelljenelle 8/3, 2014 at 10:4

What? it is a very common and basic function frame... – Luanaluanda 8/3, 2014 at 10:10

in x86_64 the number of registers has been double, so the calling convention uses registers for the first parameters, not stack – Unshod 8/3, 2014 at 10:11

I think it gives more space for optimizations. If you have a sequence of two function calls differing only by the second argument, you can simply change this argument while keeping the first on the top of the stack. And of course you do not need to adjust SP when returning from functions. – Henig 8/3, 2014 at 10:15

Poss duplicate #4535291 – Couchant 8/3, 2014 at 10:46

It is a basic code optimization. The PUSH instruction is awkward because it makes two modifications, it writes to [ESP] and modifies the ESP register. This prevents out-of-order execution, a problem that MOV doesn't have. – Suber 8/3, 2014 at 11:8

@harmic, yes it would appear so. It even looks like he's reading the same article on buffer overflows as I am. – Ajar 8/3, 2014 at 21:31

@LưuVĩnhPhúc, I thought it might be skipping the stack so I tried a case with a bunch of arguments to worship() and it behaved the same but I didn't realize there were twice as many registers in x86_64. – Ajar 8/3, 2014 at 21:35

why do you think that movq %rdi, -8(%rbp) is a parameter pushing? Any parameters must be inputted from the calling function, which is main here. The callee must read out the transferred parameter – Unshod 9/3, 2014 at 5:35

if you're using Linux then the first 6 integer parameters are passed in RDI, RSI, RDX, RCX, R8, and R9. So that's why there's RDI in the function. Pass 6 parameters and you'll see that it uses R8 and R9, which is one of the new registers. Try passing more than 7 parameters and then the stack will be used – Unshod 9/3, 2014 at 5:39

In my defence of this repost thing, the original answer is very vaguely titled – Ajar 10/3, 2014 at 21:0

GCC manual says,

-mpush-args

Push instructions will be used to pass outgoing arguments when functions are called. Enabled by default.

-mno-push-args

Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies.

-maccumulate-outgoing-args

If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.

Even -mpush-args enabled by default it is override by -maccumulate-outgoing-args which is enabled by default. Compiling passing option -mno-accumulate-outgoing-args explicitly could change the instructions to push.

Feuar answered 8/3, 2014 at 10:10 Comment(1)

This is the right answer to a different question, about how GCC would pass stack args (e.g. in 32-bit code). x86-64 System V passes the first 6 integer args in registers RDI, RSI, RDX, RCX, R8, R9. No combination of these options will change the movl $666, %edi in main to pass the arg to the callee. And spilling the register arg to -8(%rbp) (in the red zone below RSP) is also not arg passing, it's just a consequence of a debug build (-O0). Doing that using push wouldn't be a win here, because it can use the red zone. – Lucho 26/8, 2022 at 22:12

x86-64 System V passes the first 6 integer args in registers RDI, RSI, RDX, RCX, R8, R9. So in main we have mov $666, %edi (which zero-extends to the full RDI) to pass the 64-bit arg long john.

push can't write registers; nothing¹ can stop GCC from using mov to set registers, and you wouldn't want to. If you passed 7 or more args, GCC normally would use push in main to pass the 7th on the stack, because -mno-accumulate-outgoing-args is the default in modern GCC. push has been efficient on x86 since Pentium-M or so introduced a "stack engine" to track stack-pointer updates specially.

Sunil Bojanapally's answer covers those options, which are more relevant for 32-bit code where all args are passed on the stack. If you got here from searching on the title question, see that answer or Why does gcc use movl instead of push to pass function args? This answer is about the actual question, which is about what the callee does with its incoming arg in a debug build, not about how the arg is passed to it.

You're talking about the code inside the callee that stores that incoming arg to the stack. This isn't passing an arg, it's just a consequence of a debug build - every C variable gets a memory address unless declared register with the default -O0 anti-optimization level. Compilers emit instructions to store incoming register args to the stack.

In this case movq %rdi, -8(%rbp) is storing to the red zone below RSP, since worship() is a leaf function. The stack space is already effectively reserved (down to -128(%rsp), and at this point RBP=RSP).

And just to be clear, this is not part of the function call. Spilling incoming args to the stack inside the callee only happens in a debug build, not part of the calling convention.

If it had needed to sub $16, %rsp / mov-store / leave, e.g. if you'd compiled with -mno-red-zone, then yes it could have been an optimization to do that spill with push %rdi. But existing compilers don't do that optimization for initializing + creating locals.

push %rdi in worship would have required the compiler to use leave instead of just pop %rbp, which is slightly more expensive. And it would only align the stack to RSP%16 == 8 after push %rbp aligned it to RSP%16 == 0; compilers prefer to keep the stack aligned by 16 even when they're not making further function calls.

And of course if you'd just enabled optimization, worship would just be xor %eax,%eax / ret, not wasting instructions putting the register arg anywhere.

Footnote 1: -Oz (favour code-size without caring about speed) might use 3-byte push imm8 / pop rdi instead of 5-byte mov edi, imm32 to materialize a value in a register if it was in the -128..+127 range. But 666 isn't, so mov is also the smallest way to set a register to that value without any pre-existing known register values near that. (Code golf x86-64 machine code tips).

Lucho answered 26/8, 2022 at 22:44 Comment(0)

Compilers like GCC are written by people who very carefully consider how to make often used code snippets (like function call/return) as efficient as possible. Sure, their solutions target the general case, in special cases there might be better options.

Goren answered 8/3, 2014 at 16:0 Comment(1)

This is un-optimized code. mov to the red-zone is the most efficient way to spill the incoming register arg if you're going to do that at all, but if you tell the compiler to make any effort to make efficient code, worship() will compile to xor %eax,%eax / ret. – Lucho 26/8, 2022 at 22:52

Recommended topics

Hot tags