How does this program know the exact location where this string is stored?
Asked Answered
B

2

8

I have disassembled a C program with Radare2. Inside this program there are many calls to scanf like the following:

0x000011fe      488d4594       lea rax, [var_6ch]
0x00001202      4889c6         mov rsi, rax
0x00001205      488d3df35603.  lea rdi, [0x000368ff]       ; "%d" ; const char *format
0x0000120c      b800000000     mov eax, 0
0x00001211      e86afeffff     call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x00001216      8b4594         mov eax, dword [var_6ch]
0x00001219      83f801         cmp eax, 1                  ; rsi ; "ELF\x02\x01\x01"
0x0000121c      740a           je 0x1228

Here scanf has the address of the string "%d" passed to it from the line lea rdi, [0x000368ff]. I'm assuming 0x000368ff is the location of "%d" in the exectable file because if I restart Radare2 in debugging mode (r2 -d ./exec) then lea rdi, [0x000368ff] is replaced by lea rdi, [someMemoryAddress].

If lea rdi, [0x000368ff] is whats hard coded in the file then how does the instruction change to the actual memory address when run?

Beacham answered 1/9, 2019 at 19:49 Comment(1)
relocation address maybe? disassembler showing some relative address, where debugger showing real relocated ?Oneupmanship
L
11

Radare is tricking you, what you see is not the real instruction, it has been simplified for you.

The real instruction is:

0x00001205    488d3df3560300    lea rdi, qword [rip + 0x356f3]
0x0000120c    b800000000        mov eax, 0

This is a typical position independent lea. The string to use is stored in your binary at the offset 0x000368ff, but since the executable is position independent, the real address needs to be calculated at runtime. Since the next instruction is at offset 0x0000120c, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x000368ff - 0x0000120c) = rip + 0x356f3, which is what you see above.

When doing static analysis, since Radare does not know the base address of the binary in memory, it simply calculates 0x0000120c + 0x356f3 = 0x000368ff. This makes reverse engineering easier, but can be confusing since the real instruction is different.


As an example, the following program:

int main(void) {
    puts("Hello world!");
}

When compiled produces:

  6b4:   48 8d 3d 99 00 00 00    lea    rdi,[rip+0x99] 
  6bb:   e8 a0 fe ff ff          call   560 <puts@plt>

So rip + 0x99 = 0x6bb + 0x99 = 0x754, and if we take a look at offset 0x754 in the binary with hd:

$ hd -s 0x754 -n 16 a.out
00000754  48 65 6c 6c 6f 20 77 6f  72 6c 64 21 00 00 00 00  |Hello world!....|
00000764
Lawson answered 1/9, 2019 at 20:15 Comment(3)
wording nitpick: LEA isn't actually a load. It's just an address calculation to put a static address into a register. Loads from the string don't happen until the callee dereferences its arg.Paisley
I meant "load" literally as in "Load Effective Address [into register]", which is what the instruction does. Also, what do you mean with "static"? The address is dynamically calculated at runtime.Lawson
I meant the address of statically-allocated storage. i.e. what C calls static storage class, as opposed to the address of a local on the stack (automatic storage), or loading a pointer from somewhere. Re: the word "load" - it's easily misleading, and without the qualifier "effective address" the standard meaning is "load from memory". The fact that we're talking about the x86 ISA doesn't mean "load" stops having its standard meaning when used in a plain English sentence, regardless of the existence of mnemonics like lea and lahf using it for writing a register from a source other than memory.Paisley
D
8

The full instruction is

48 8d 3d f3 56 03 00

This instruction is literally

lea rdi, [rip + 0x000356f3]

with a rip relative addressing mode. The instruction pointer rip has the value 0x0000120c when the instruction is executed, thus rdi receives the desired value 0x000368ff.

If this is not the real address, it is possible that your program is a position-independent executable (PIE) which is subject to relocation. Since the address is encoded using a rip-relative addressing mode, no relocation is needed and the address is correct, regardless of where the binary is loaded.

Dullard answered 1/9, 2019 at 20:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.