Why function parameter occupy at least 4 bytes stack on x86?

Asked 6/6, 2015 at 6:25 Answered 6/6, 2015 at 10:43

Function parameter is allocated with at least 4 bytes via push/pop if they are allocated in stack on x86. This wastes memory if there are many parameters sized less than 4 bytes for each function invocation. One reason might be push and pop work on 4 bytes least, but why not operate on esp directly to save stack space which could pack 4 parameters in 1 byte to one 4 bytes memory as below?

sub esp, 4
mov byte ptr [esp], para1
mov byte ptr [esp+1], para2
mov byte ptr [esp+2], para3
mov byte ptr [esp+3], para4
call func

Atlantis answered 6/6, 2015 at 6:25 Comment(3)

In assembly, no one prevents you from doing it in that way. – Jukebox 6/6, 2015 at 6:32

You can do a 2-byte push ax in any mode (16, 32, or 64-bit), it's just normally not useful outside of 16-bit mode. As you say, normal calling conventions pad stack args to fill a whole arg-passing "slot" (a register, or register-width chunk of stack memory). – Copperas 9/3, 2021 at 5:8

A recent duplicate has some other similar answers: Why argument's size of function is increased to word size? – Copperas 9/3, 2021 at 5:10

Such behaviour is normally governed by the Application Binary Interface (ABI) and the mostly used x86 ABIs (Win32 and Sys V) just requires that each parameter occupies at least 4 bytes. This is mainly due to the fact that most x86 implementations suffer from performance penalties if data is not properly aligned. While your example would not "de-align" the stack, a subroutine taking only three byte sized parameters would do so. Of course, one could define special rules in the ABI to overcome this but it complicates things for little gain.

Keep also in mind, that the x86 ABIs were designed around 1990. At this time, the number of instructions was a very good measure for the speed of a certain piece of code. You example requires one extra instruction compared with four pushes if para1-para4 are located in registers and five extra instructions in the worst case, that all parameters must be loaded from memory (x86 supports pushing memory locations directly).

Further, in your example, you trade saving 12 bytes on the stack for 14 extra code bytes: your code sequence requires 18 bytes of code in case para1-para4 (e.g. al-dl) are located in registers while four pushes require 4 bytes. So overall, you reduce the memory footprint only if you have recursions in your code.

Octet answered 6/6, 2015 at 9:2 Comment(0)

A more general answer is that the stack word size for a platform is typically the width of a pointer to a memory location. Since you're dealing with a 32-bit application, a 32 bit word size would be expected and would be the stack alignment.

Famished answered 6/6, 2015 at 10:43 Comment(0)

Recommended topics

Hot tags