Assembly - x86 call instruction and memory address?
Asked Answered
A

1

6

I've been reading some assembly code and I've started seeing that call instructions are actually program counter relative.

However, whenever I'm using visual studio or windbg to debug, it always says call 0xFFFFFF ... which to me means it's saying I'm going to jump to that address.

Who is right? Is Visual Studio hiding the complexity of the instruction encoding and just saying oh that's what the program means, that is the debugger know it's a pc-relative instruction, and since it knows the pc, it just goes and does the math for you?

Highly confused.

Araucanian answered 4/8, 2015 at 20:38 Comment(4)
Yes, when disassembling jump and call instructions disassemblers show you the absolute target address because that's how you would write the instruction in assembly.Wallinga
But that's the part I'm confused about, it seems like that is NOT how you would write assembly. The compiler would emit E8 cd -XXX or something. So the compiler writer actually wrote a pc-relative thing. I guess a follow-on question could be how do you a user know if the compiler said call 0xFFFF or call pcRelative?Araucanian
I'm saying there's a language called assembly in which people write assembly programs. In this language jump and call instructions use the absolute address (eg. call _foo or call 0x12345) and the assembler generates the appropriate machine language encoding. Disassemblers reverse this process. The fact that the code wasn't actually generated by an assembler doesn't change how disassemblers work.Wallinga
SEe also: https://mcmap.net/q/14442/-what-do-linkers-doAirspeed
C
8

If you're disassembling .o object files that haven't been linked yet, the call address will just be a placeholder to be filled in by the linker.

You can use objdump -drwc -Mintel to show the relocation types + symbol names from a .o (The -r option is the key. Or -R for an already-linked shared library.)


It's more useful to the user to show the actual address of the jump target, rather than disassemble it as jcc eip-1234H or something. Object files have a default load address, so the disassembler has a value for eip at every instruction, and this is usually present in disassembly output.

e.g. in some asm code I wrote (where I use symbol names that made it into the object file, so the loop branch target is actually visible to the disassembler):

objdump -M intel  -d rs-asmbench:
...
00000000004020a0 <.loop>:
  4020a0:       0f b6 c2                movzx  eax,dl
  4020a3:       0f b6 de                movzx  ebx,dh
   ...
  402166:       49 83 c3 10             add    r11,0x10
  40216a:       0f 85 30 ff ff ff       jne    4020a0 <.loop>

0000000000402170 <.last8>:
  402170:       0f b6 c2                movzx  eax,dl

Note that the encoding of the jne instruction is a signed little-endian 32bit displacement, of -0xD0 bytes. (jumps add their displacement to the value of e/rip after the jump. The jump instruction itself is 6 bytes long, so the displacement has to be -0xD0, not just -0xCA.) 0x100 - 0xD0 = 0x30, which is the value of the least-significant byte of the 2's complement displacement.

In your question, you're talking about the call addresses being 0xFFFF..., which makes little sense unless that's just a placeholder, or you thought the non-0xFF bytes in the displacement were part of the opcode.

Before linking, references to external symbols look like this:

objdump -M intel -d main.o
  ...
  a5:   31 f6                   xor    esi,esi
  a7:   e8 00 00 00 00          call   ac <main+0xac>
  ac:   4c 63 e0                movsxd r12,eax
  af:   ba 00 00 00 00          mov    edx,0x0
  b4:   48 89 de                mov    rsi,rbx
  b7:   44 89 f7                mov    edi,r14d
  ba:   e8 00 00 00 00          call   bf <main+0xbf>
  bf:   83 f8 ff                cmp    eax,0xffffffff
  c2:   75 cc                   jne    90 <main+0x90>
  ...

Notice how the call instructions have their relative displacement = 0. So before the linker has slotted in the actual relative value, they encode a call with a target of the instruction right after the call. (i.e. RIP = RIP+0). The call bf is immediately followed by an instruction that starts at 0xbf from the start of the section. The other call has a different target address because it's at a different place in the file. (gcc puts main in its own section: .text.startup).

So, if you want to make sense of what's actually being called, look at a linked executable, or get a disassembler that has looks at the object file symbols to slot in symbolic names for call targets instead of showing them as calls with zero displacement.

Relative jumps to local symbols already get resolved before linking:

objdump -Mintel  -d asm-pinsrw.o:
0000000000000040 <.loop>:
  40:   0f b6 c2                movzx  eax,dl
  43:   0f b6 de                movzx  ebx,dh
  ...
 106:   49 83 c3 10             add    r11,0x10
 10a:   0f 85 30 ff ff ff       jne    40 <.loop>
0000000000000110 <.last8>:
 110:   0f b6 c2                movzx  eax,dl

Note, the exact same instruction encoding on the relative jump to a symbol in the same file, even though the file has no base address, so the disassembler just treats it as zero.

See Intel's reference manual for instruction encoding. Links at https://stackoverflow.com/tags/x86/info. Even in 64bit mode, call only supports 32bit sign-extended relative offsets. 64bit addresses are supported as absolute. (In 32bit mode, 16bit relative addresses are supported, with an operand-size prefix, I guess saving one instruction byte.)

Coltin answered 4/8, 2015 at 21:11 Comment(4)
Your statement about object files having a default load address may have unblocked my learning here. I'm now concluding that because the object file has a single starting address, and all functions are relative to that (because they just are) when the object file was loaded all functions in that object were relocated by some number?Araucanian
@halivingston: yup. That 4020a0 address didn't come out of nowhere. That's where that instruction will be in the process's virtual memory when running that file. That's why the disassembler picked that address. (ASLR changes things, which means you have to disable address-space layout randomization for code that actually uses absolute addresses, instead of RIP-relative.) Before linking, though, .o / .obj object files don't have an address, just a list of symbols and places to write those relocations into the machine code.Coltin
Thanks, Peter. One last question that your final statement prompted: "code that actually uses absolute address" -- I'm starting to realize that even people who write assembly, just say call FooFunction, they don't give an address, in fact, they don't even choose to say use opcode E8 (pcrelative) or opcode FF (absolute), they just say call FooFunction. My question is who decides to put opcode E8? or FF? Is it say the platform assembler that says oh I'm on Intel let me make it E8? I was under the impression if I'm writing my compiler, I have to literally decide which opcode?Araucanian
@halivingston: I updated my answer some. re: opcodes. The assembler decides. The default syntax of call foo is always going to generate a relative call. You'd get FF from call *foo_funcptr or call *eax. (I may be mixing up Intel / AT&T syntax here.) Note that FF is an indirect call. Call via a function pointer. RIP = [foo_funcptr], not RIP = foo_funcptr. In 64bit mode, [foo_funcptr] would generate a RIP-relative address for the pointer. A jump table like call *[rdi + rax*8] is also possible.Coltin

© 2022 - 2024 — McMap. All rights reserved.