How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work?
Asked Answered
F

1

18

Consider the following variable reference in x64 Intel assembly, where the variable a is declared in the .data section:

mov eax, dword ptr [rip + _a]

I have trouble understanding how this variable reference works. Since a is a symbol corresponding to the runtime address of the variable (with relocation), how can [rip + _a] dereference the correct memory location of a? Indeed, rip holds the address of the current instruction, which is a large positive integer, so the addition results in an incorrect address of a?

Conversely, if I use x86 syntax (which is very intuitive):

mov eax, dword ptr [_a]

, I get the following error: 32-bit absolute addressing is not supported in 64-bit mode.

Any explanation?

  1 int a = 5;
  2 
  3 int main() {
  4     int b = a;
  5     return b;
  6 }   

Compilation: gcc -S -masm=intel abs_ref.c -o abs_ref:

  1     .section    __TEXT,__text,regular,pure_instructions
  2     .build_version macos, 10, 14
  3     .intel_syntax noprefix
  4     .globl  _main                   ## -- Begin function main
  5     .p2align    4, 0x90
  6 _main:                                  ## @main
  7     .cfi_startproc
  8 ## %bb.0:
  9     push    rbp
 10     .cfi_def_cfa_offset 16
 11     .cfi_offset rbp, -16
 12     mov rbp, rsp
 13     .cfi_def_cfa_register rbp
 14     mov dword ptr [rbp - 4], 0
 15     mov eax, dword ptr [rip + _a]
 16     mov dword ptr [rbp - 8], eax
 17     mov eax, dword ptr [rbp - 8]
 18     pop rbp
 19     ret
 20     .cfi_endproc
 21                                         ## -- End function
 22     .section    __DATA,__data
 23     .globl  _a                      ## @a
 24     .p2align    2
 25 _a:
 26     .long   5                       ## 0x5
 27 
 28 
 29 .subsections_via_symbols
Flavine answered 18/2, 2019 at 11:7 Comment(4)
Which assembler accepts mov eax, dword ptr [rip + _a]? MASM? If it does it will probably use the right offset to make rip + _a point to _a (i.e. it will not use the address of _a). In NASM you use mov eax, DWORD [REL _a] (or you set it to be the default). When writing assembly, the RIP-relative thing is used as in "compute this address relative to RIP" not as in "add this specific offset to RIP" since you almost never know where your code will be.Quadruplet
@MargaretBloom - thanks for your reply. Please, see my updated question with source code. Indeed, I guess the addressing would be relative to the rip register; however, the syntax doesn't reflect that very good, does it? So, what you are saying is that the loader replaces [rip + _a] with the absolute address of a at runtime; or will _a be replaced with the relative offset of a (possible negative) w.r.t to the address of the instruction (mov rax, dword ptr [rip + _a])?Flavine
After edit: That's just disassembly notation. It carry both the facts that RIP-relative addressing is being used and that _a is the final target. Inspect the opcodes and you will see. That is indeed misleading notation.Quadruplet
@MargaretBloom - thank you very much.Flavine
S
23

GAS syntax for RIP-relative addressing looks like symbol + current_address (RIP), but it actually means symbol with respect to RIP.

There's an inconsistency with numeric literals:

  • [rip + 10] or AT&T 10(%rip) means 10 bytes past the end of this instruction

  • [rip + a] or AT&T a(%rip) means to calculate a rel32 displacement to reach a, not RIP + symbol value. (The GAS manual documents this special interpretation)

  • [a] or AT&T a is an absolute address, using a disp32 addressing mode. This isn't supported on OS X, where the image base address is always outside the low 32 bits. (Or for mov to/from al/ax/eax/rax, a 64-bit absolute moffs encoding is available, but you don't want that).

    Linux position-dependent executables do put static code/data in the low 31 bits (2GiB) of virtual address space, so you can/should use mov edi, sym there, but on OS X your best option is lea rdi, [sym+RIP] if you need an address in a register. Unable to move variables in .data to registers with Mac x86 Assembly.

(In OS X, the convention is that C variable/function names are prepended with _ in asm. In hand-written asm you don't have to do this for symbols you don't want to access from C.)


NASM is much less confusing in this respect:

  • [rel a] means RIP-relative addressing for [a]
  • [abs a] means [disp32].
  • default rel or default abs sets what's used for [a]. The default is (unfortunately) default abs, so you almost always want a default rel.

Example with .set symbol values vs. a label

.intel_syntax noprefix
mov  dword ptr [sym + rip], 0x11111111
sym:

.equ x, 8 
inc  byte ptr [x + rip]

.set y, 32 
inc byte ptr [y + rip]

.set z, sym
inc byte ptr [z + rip]

gcc -nostdlib foo.s && objdump -drwC -Mintel a.out (on Linux; I don't have OS X):

0000000000001000 <sym-0xa>:
    1000:       c7 05 00 00 00 00 11 11 11 11   mov    DWORD PTR [rip+0x0],0x11111111        # 100a <sym>    # rel32 = 0; it's from the end of the instruction not the end of the rel32 or anywhere else.

000000000000100a <sym>:
    100a:       fe 05 08 00 00 00       inc    BYTE PTR [rip+0x8]        # 1018 <sym+0xe>
    1010:       fe 05 20 00 00 00       inc    BYTE PTR [rip+0x20]        # 1036 <sym+0x2c>
    1016:       fe 05 ee ff ff ff       inc    BYTE PTR [rip+0xffffffffffffffee]        # 100a <sym>

(Disassembling the .o with objdump -dr will show you that there aren't any relocations for the linker to fill in, they were all done at assemble time.)

Notice that only .set z, sym resulted in a with-respect-to calculation. x and y were original from plain numeric literals, not labels, so even though the instruction itself used [x + RIP], we still got [RIP + 8].


(Linux non-PIE only): To address absolute 8 wrt. RIP, you'd need AT&T syntax incb 8-.(%rip). I don't know how to write that in GAS intel_syntax; [8 - . + RIP] is rejected with Error: invalid operands (*ABS* and .text sections) for '-'.

Of course you can't do that anyway on OS X, except maybe for absolute addresses that are in range of the image base. But there's probably no relocation that can hold the 64-bit absolute address to be calculated for a 32-bit rel32.


Related:

Saddleback answered 18/2, 2019 at 12:53 Comment(3)
I would use in nasm [rel a] for [rip + a] or AT&T a(%rip) and for this [rip + 10] i assume [rel 10]' and [rip + 10] means 10 bytes past the end of this instruction. i did't understand this clearly. let assume i define a variable in .data section named var how to access var with this [rel ?] what number to use ? how to identify it is how many bytes before ? or this syntax is for any other purpose.Decoction
or there is any amount of memory for lines of codes like mov rax , 100 is one byte and next line is one byte?Decoction
@srilakshmikanthanp: In NASM syntax, mov eax, [rel var] accesses var using RIP-relative addressing. Or use default rel somewhere in your file so mov eax, [var] uses RIP-relative addressing. You never want to use numeric offsets manually unless you already have a specific reason for knowing what offset. Just put a label somewhere and reference it. You can of course look at disassembly or a listing from nasm -l/dev/stdout -felf64 foo.asm to see instruction lengths.Saddleback

© 2022 - 2024 — McMap. All rights reserved.