Why does gcc use movl instead of push to pass function args?

Asked 26/12, 2010 at 17:42 Answered 27/12, 2010 at 0:20

Solved c gcc assembly callstack calling-convention

pay attention to this code :

#include <stdio.h>
void a(int a, int b, int c)
{
    char buffer1[5];
    char buffer2[10];
}

int main()
{
    a(1,2,3); 
}

after that :

gcc -S a.c

that command shows our source code in assembly.

now we can see in the main function, we never use "push" command to push the arguments of the a function into the stack. and it used "movel" instead of that

main:
 pushl %ebp
 movl %esp, %ebp
 andl $-16, %esp
 subl $16, %esp
 movl $3, 8(%esp)
 movl $2, 4(%esp)
 movl $1, (%esp)
 call a
 leave

why does it happen? what's difference between them?

Eskridge answered 26/12, 2010 at 17:42 Comment(0)

Here is what the gcc manual has to say about it:

-mpush-args
-mno-push-args
    Use PUSH operations to store outgoing parameters. This method is shorter and usually
    equally fast as method using SUB/MOV operations and is enabled by default. 
    In some cases disabling it may improve performance because of improved scheduling
    and reduced dependencies.

 -maccumulate-outgoing-args
    If enabled, the maximum amount of space required for outgoing arguments will be
    computed in the function prologue. This is faster on most modern CPUs because of
    reduced dependencies, improved scheduling and reduced stack usage when preferred
    stack boundary is not equal to 2. The drawback is a notable increase in code size.
    This switch implies -mno-push-args.

Apparently -maccumulate-outgoing-args is enabled by default, overriding -mpush-args. Explicitly compiling with -mno-accumulate-outgoing-args does revert to the PUSH method, here.

2019 update: modern CPUs have had efficient push/pop since about Pentium M.
-mno-accumulate-outgoing-args (and using push) eventually became the default for -mtune=generic in Jan 2014.

Jeep answered 27/12, 2010 at 0:20 Comment(4)

A much better question would be why this bloat-generating option -maccumulate-outgoing-args is not automatically disabled by -Os. – Salian 28/12, 2010 at 4:32

@R.. So do you know why? – Hypha 25/3, 2015 at 12:49

@Tony: obviously, because when deciding which of the many (~200) optimization flags to enable/disable for each specific -O option, sometimes things slip through the cracks. – Mutual 27/7, 2015 at 21:6

Update: -maccumulate-outgoing-args was disabled for the default -mtune=generic in January 2014, now that CPUs without stack-engines are very uncommon. (It probably should have been done sooner). – Wadai 30/8, 2016 at 23:44

That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.

"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.

Guillory answered 26/12, 2010 at 17:45 Comment(1)

Why would you need to put the constant into a register first? x86 supports pushing of immediate constants – Brianabriand 26/12, 2010 at 19:50

gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like x *= n is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, or n is not a constant (and thus using loops with shl-add-sub would come costlier).

In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.

Wilow answered 26/12, 2010 at 19:5 Comment(0)

Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.

I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607

Radiograph answered 26/12, 2010 at 19:5 Comment(2)

Just to be clear, the OS X ABI only requires the stack pointer be 16-byte aligned at the point of external function calls. – Shivers 26/12, 2010 at 22:19

I see, thanks for pointing that out. Reading the other answers I now understand the movl code generation is related to improved scheduling. The andl instruction does seem to only be there for stack alignment though. – Radiograph 27/12, 2010 at 9:58

Recommended topics

Hot tags