Assembly: Purpose of loading the effective address before a call to a function?

Asked 22/1, 2020 at 13:40 Answered 22/1, 2020 at 14:8

Source C Code:

 int main()
    {
      int i;
      for(i=0, i < 10; i++)
      {
        printf("Hello World!\n");
      }
    }

Dump of Intel syntax x86 assembler code for function main:

  1.  0x000055555555463a <+0>:     push   rbp
  2.  0x000055555555463b <+1>:     mov    rbp,rsp 
  3.  0x000055555555463e <+4>:     sub    rsp,0x10
  4.  0x0000555555554642 <+8>:     mov    DWORD PTR [rbp-0x4],0x0
  5.  0x0000555555554649 <+15>:    jmp    0x55555555465b <main+33>
  6.  0x000055555555464b <+17>:    lea    rdi,[rip+0xa2]    # 0x5555555546f4
  7.  0x0000555555554652 <+24>:    call   0x555555554510 <puts@plt>
  8.  0x0000555555554657 <+29>:    add    DWORD PTR [rbp-0x4],0x1
  9.  0x000055555555465b <+33>:    cmp    DWORD PTR [rbp-0x4],0x9
  10. 0x000055555555465f <+37>:    jle    0x55555555464b <main+17>
  11. 0x0000555555554661 <+39>:    mov    eax,0x0
  12. 0x0000555555554666 <+44>:    leave  
  13. 0x0000555555554667 <+45>:    ret

I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.

I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.

1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).

My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.

So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?

2nd Question: What is the purpose of line 11? (mov eax,0x0?) I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.

Shantae answered 22/1, 2020 at 13:40 Comment(0)

The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?

The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)

The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie

mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.

If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).

Confabulate answered 22/1, 2020 at 13:46 Comment(0)

This:

lea    rdi,[rip+0xa2]

Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).

Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.

In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.

The purpose of:

mov eax,0x0

Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.

Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.

Viewing answered 22/1, 2020 at 14:1 Comment(2)

Thanks for fixing the same problem in your answer on How does this program know the exact location where this string is stored?. At the time I didn't manage to convince you to change it, but I wasn't going to let the same error happen again :P – Confabulate 22/1, 2020 at 14:27

@PeterCordes yeah, I only realized what you meant now :') sorry about that. – Viewing 22/1, 2020 at 14:29

If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.

There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU

On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.

EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.

The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions

Here you have more interesting cases https://godbolt.org/z/M4MeGk

Disrespectful answered 22/1, 2020 at 14:8 Comment(0)

Recommended topics

Hot tags