There are two main reasons why RIP-relative addressing is used to access the static variable b
. The first is that it makes the code position independent, meaning if it's used in a shared library or position independent executable the code can be more easily relocated. The second is that it allows the code to be loaded anywhere in the 64-bit address space without requiring huge 8 byte (64-bit) displacements to be encoded in the instruction, which aren't supported by 64-bit x86 CPUs anyways.
You mention that the compiler could instead generate code that referenced the variable relative to the beginning of the section it lives in. While its true doing this would also have the same advantages as given above, it wouldn't make the assembly any less complicated. In fact it will make it more complicated. The generated assembly code would first have to calculate the address of the section the variable lives in, since it would only know its location relative to the instruction pointer. It would then have to store it in a register, so accesses to b
(and any other variables in the section) can be made relative to that address.
Since 32-bit x86 code doesn't support RIP-relative addressing, your alternate solution is fact what the compiler does when generating 32-bit position independent code. It places the variable b
in the global offset table (GOT), and then accesses the variable relative to the base of the GOT. Here's the assembly generated by your code when compiled with gcc -m32 -O3 -fPIC -S test.c
:
natural_generator:
call __x86.get_pc_thunk.cx
addl $_GLOBAL_OFFSET_TABLE_, %ecx
movl b.1392@GOTOFF(%ecx), %eax
leal 1(%eax), %edx
addl $2, %eax
movl %edx, b.1392@GOTOFF(%ecx)
ret
The first function call places the address of the following instruction in ECX. The next instruction calculates the address of the GOT by adding the relative offset of the GOT from the start of the instruction. The variable ECX now contains the address of the GOT and is used as a base when accessing the variable b
in the rest of the code.
Compare that to 64-bit code generated by gcc -m64 -O3 -S test.c
:
natural_generator:
movl b.1745(%rip), %eax
leal 1(%rax), %edx
addl $2, %eax
movl %edx, b.1745(%rip)
ret
(The code is different than the example in your question because optimization is turned on. In general its a good idea to only look at optimized output, as without optimization the compiler often generates terrible code that does a lot of useless things. Also note that the -fPIC
flag doesn't need to be used, as the compiler generates 64-bit position independent code regardless.)
Notice how there's two fewer assembly instructions in the 64-bit version making it the less complicated version. You can also see that the code uses one less register (ECX). While it doesn't make much of a difference in your code, in a more complicated example that's a register that could've been used for something else. That makes the code even more complicated as the compiler needs to do more juggling of registers.
objdump -drwC -Mintel
to get nice output.-r
decodes the symbol table. objdump always does the math for you, and shows the actual target address of RIP-relative instruction as well as the offset from RIP. – Lexicostatisticsb
isn't initialized in the function. – Squirmyint
with a constant initializer. – Lexicostatistics