why to use ebp in function prologue/epilogue?

Asked 27/3, 2013 at 9:34 Answered 27/3, 2013 at 9:58

Some time ago I was experimenting with writing assembly routines and linking it with C programs and I found that I just can skip standard C-call prologue epilogue

    push ebp
    mov ebp, esp
    (sub esp, 4
    ...
    mov esp, ebp)
    pop ebp

just skip it all and adress just by esp, like

    mov eax, [esp+4]          ;; take argument
    mov [esp-4], eax          ;; use some local variable storage

It seem to work quite good. Why this ebp is used - is maybe addressing through ebp faster or what ?

Surtax answered 27/3, 2013 at 9:34 Comment(2)

This technique has a name: frame pointer optimization. – Maltreat 13/9, 2014 at 11:30

possible duplicate of What is the purpose of the EBP frame pointer register? – Hansen 1/6, 2015 at 8:7

There's no requirement to use a stack frame, but there are certainly some advantages:

Firstly, if every function has uses this same process, we can use this knowledge to easily determine a sequence of calls (the call stack) by reversing the process. We know that after a call instruction, ESP points to the return address, and that the first thing the called function will do is push the current EBP and then copy ESP into EBP. So, at any point we can look at the data pointed to by EBP which will be the previous EBP and that EBP+4 will be the return address of the last function call. We can therefore print the call stack (assuming 32bit) using something like (excuse the rusty C++):

void LogStack(DWORD ebp)
{
    DWORD prevEBP = *((DWORD*)ebp);
    DWORD retAddr = *((DWORD*)(ebp+4));

    if (retAddr == 0) return;

    HMODULE module;
    GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, (const char*)retAddr, &module);
    char* fileName = new char[256];
    fileName[255] = 0;
    GetModuleFileNameA(module, fileName, 255);
    printf("0x%08x: %s\n", retAddr, fileName);
    delete [] fileName;
    if (prevEBP != 0) LogStack(prevEBP);
}

This will then print out the entire sequence of calls (well, their return addresses) up until that point.

Furthermore, since EBP doesn't change unless you explicitly update it (unlike ESP, which changes when you push/pop), it's usually easier to reference data on the stack relative to EBP, rather than relative to ESP, since with the latter, you have to be aware of any push/pop instructions that might have been called between the start of the function and the reference.

As others have mentioned, you should avoid using stack addresses below ESP as any calls you make to other functions are likely to overwrite the data at these addresses. You should instead reserve space on the stack for use by your function by the usual:

sub esp, [number of bytes to reserve]

After this, the region of the stack between the initial ESP and ESP - [number of bytes reserved] is safe to use. Before exiting your function you must release the reserved stack space using a matching:

add esp, [number of bytes reserved]

Lennyleno answered 27/3, 2013 at 9:58 Comment(4)

Haha, :) this is good (I mean especially the LogStack example) I am sorry i can acept all the three answers becouse they all are very good :) As to the thing, personally I like to be an optymization maniac (I want to write the fastest possible asm routines) so cue thing is if maybe this esp-4 overvrittens do not occur, or they really do ? (some text on it maybe ? somebody ?) If so I can skip all the prologue as I wrote but instead of using esp-4 use a static buffer for locals - it should be top speed :U ye ? – Surtax 27/3, 2013 at 10:22

@user2214913: you're getting into premature optimization. Any meaningful work your code does will, in a non-trivial program, dwarf the cycles you shave off that way. And people who need to debug your code will not appreciate this. – Nineteenth 27/3, 2013 at 10:24

@Surtax As DCoder says, you're probably not producing faster code by writing in assembly, and I any speed gains you get from leaving out stack frames and not reserving stack space before using it will be dwarfed by inefficiencies you're introducing. The fact that you're asking this very basic question suggests your code probably isn't written to avoid slow-downs from cache misses, or take advantage of CPU-specific instruction ordering to achieve parallel execution. A decent compiler will be aware of these things and will generally produce much faster code than hand-rolled assembly. – Lennyleno 27/3, 2013 at 10:36

It would be easy to experiment with a C/C++ codebase, having the compiler not use a frame pointer (-fomit-frame-pointer and the like with gcc style compilers) and see what kind of real-world performance difference it makes. With i386, there can up to high single-digit performance gains with the right code base, primarily because there is an extra general-purpose register available (not so much because of the extra instruction in the prologue/epilogue) - i386 is register poor. With the x86-64 ISA, with the larger number of registers, there is little benefit in messing with this stuff. – Roose 28/3, 2013 at 9:40

The use of EBP is of great help when debugging code, as it allows debuggers to traverse the stack frames in a call chain.

It [creates] a singly linked list that linked the frame pointer for each of the callers to a function. From the EBP for a routine, you could recover the entire call stack for a function.

See http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
And in particular the page it links to which covers your question: http://blogs.msdn.com/b/larryosterman/archive/2007/03/12/fpo.aspx

Entrepreneur answered 27/3, 2013 at 9:47 Comment(3)

much tnx, this page about fpu is indeed answer to that - but the thing that mellowcandla said it is very interesting also – Surtax 27/3, 2013 at 10:1

Though i could add that I do nopt understand how this linked list works, and that tt seem that without ebp this information can be retrieved also - if function can return back (and return back and return back) then you can trace this path too - without ebp – Surtax 27/3, 2013 at 10:6

@user2214913: the function can return back to its caller, because the return address is stored on the stack next to the pushed parameters. However, a debugger cannot distinguish the return address from any other value on the stack, so it cannot scan the stack to find the next return address after that. The debugger cannot reliably determine how much stack space that function used locally and skip over it, either. Seriously, read the linked pages. – Nineteenth 27/3, 2013 at 10:12

It works, However, once you'll get an interrupt, the processor will push all it's registers and flags into the stack, overwriting your value. The stack is there for a reason, use it...

Porcia answered 27/3, 2013 at 9:35 Comment(7)

Hmm, you sure as to that ? I used this and not experienced an error is this just a matter of luck ? If so I understand that i can use it but shouldnt just use [esp-X] values ? or other way I can sub esp 20 hen add 20 to it ? – Surtax 27/3, 2013 at 9:40

are you sure about this overriting from system? some text on it ? – Surtax 27/3, 2013 at 9:55

@Lennyleno It may be an asynchronous signal and not a real interrupt, it doesn't matter which. Everything below esp is unprotected and can be corrupted by some events. – Defrock 27/3, 2013 at 9:56

'mov [esp-4], eax' - awesomely dangerous, yes. – Selfimmolating 27/3, 2013 at 13:19

Even if a stack switch occurs via an interrupt gate, I would not be at all confident about accessing locations below the stack pointer. I'm not sure what would happen if the access generated a page fault, for example. – Selfimmolating 27/3, 2013 at 13:28

Still not sure if this overvritte occurs - some ppl say that system interrupts have their own stack – Surtax 31/3, 2013 at 9:38

@Surtax this can be done - an interrupt can cause a stack switch. Nevertheless, I'm not convinced that such an approach is page-fault safe, and besides, why write code where you cannot put in a call to anything else, a debug dump, say, as development progresses? – Selfimmolating 1/4, 2013 at 9:12

Recommended topics

Hot tags