Understanding the auipc+jalr sequence used for function calls
Asked Answered
M

1

9

I was trying to read RISC-V assembly generated by gcc and I found that gcc creates sequence of auipc+jalr for some function calls and I don't understand how it works. Here's a simple example. Consider the following C source file:

unsigned long id(unsigned long x) {
    return x;
}

unsigned long add_one(unsigned long x) {
    return id(x)+1;
}

I compile it with gcc -O2 -fno-inline -c test.c and I get the following assembly code:

$ objdump -d test.o

test.o:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000000 <id>:
   0:   00008067            ret

0000000000000004 <add_one>:
   4:   ff010113            addi    sp,sp,-16
   8:   00113423            sd      ra,8(sp)
   c:   00000317            auipc   t1,0x0
  10:   000300e7            jalr    t1
  14:   00813083            ld      ra,8(sp)
  18:   00150513            addi    a0,a0,1
  1c:   01010113            addi    sp,sp,16
  20:   00008067            ret

What confuses me are the two lines at the offsets 0x0c and 0x10, which is where the function id is supposed to be called. According to the spec, auipc t1,0x0 should write PC + 0x0<<12 (which is equal to PC) to t1 and then jalr t1 (which gets expanded to jalr ra,t1,0) jumps to the address stored in t1 and stores the return address to ra. So we end up jumping to the auipc line (offset 0x0c), not the entry point of id. What's going on here?

Mast answered 13/5, 2017 at 18:29 Comment(3)
You're looking at an object file. Either build an executable and disassemble it, or, if you're sane, just look at the code generated by gcc -S.Foregut
safsaf32, probably there are some relocations not yet resolved. Build executable to resolve some (then objdump -d); others will be resolved only at load/run time (use gdb with start and disassemble function_name commands). Relocations are hidden from default disassembler view in objdump, use objdump -drR (--reloc and --dynamic-reloc options) to see them (You may also check asm output test.s of compiler with gcc -O2 -fno-inline -S test.c to see how compiler pass instructions towards linker and loader)Choplogic
Thank you both. I naively thought that internal calls would be resolved in the compilation phase.Mast
G
5

When disassembling an object file, the displayed address information in auipc/jalr is kind of arbitrary because it's get relocated by the linker, anyways.

You can see that when also dumping the relocation information (add -r to your objdump call):

0000000000000000 <id>:
   0:   8082                    ret
0000000000000002 <add_one>:
   2:   1141                    addi    sp,sp,-16
   4:   e406                    sd  ra,8(sp)
   6:   00000097            auipc   ra,0x0
            6: R_RISCV_CALL id
            6: R_RISCV_RELAX    *ABS*
   a:   000080e7            jalr    ra # 6 <add_one+0x4>
   e:   60a2                    ld  ra,8(sp)
  10:   0505                    addi    a0,a0,1
  12:   0141                    addi    sp,sp,16
  14:   8082                    ret

Those relocation entries tell the linker to relocate the jump instructions in a relaxed fashion (the default for the RISC-V toolchain). That means it's allowed to replace auipc+jalr pairs with just one jal instruction iff the distance to the target address is short enough. Such replacements are advantageous because it saves instructions, i.e. the resulting program is shorter. Obviously, it complicates the relocation procedure a bit, because the offsets of following jump instructions need to be adjusted accordingly.

(This can be disabled with the -mno-relax GCC flag.)

Why can't the assembler directly emit final auipc/jalr/jal instructions for symbols local to the translation unit that don't need to be relocated? After all, those jumps are pc-relative.

In general it can't because with just the local view of one translation unit 1) a relaxed relocation to an external symbol may change all following offsets to internal symbols and 2) the linker might even apply some advanced rule, e.g. where an internal symbol is overlayed by an external one, such that it really has to be relocated in the linker. Or, another example, where the linker deletes a symbol.

If you want to look at relocated addresses/offsets you have to disassemble the linked binary, e.g.:

000000000001015c <id>:
   1015c:   8082                    ret
000000000001015e <add_one>:
   1015e:   1141                    addi    sp,sp,-16
   10160:   e406                    sd  ra,8(sp)
   10162:   ffbff0ef            jal ra,1015c <id>
   10166:   60a2                    ld  ra,8(sp)
   10168:   0505                    addi    a0,a0,1
   1016a:   0141                    addi    sp,sp,16
   1016c:   8082                    ret

As expected, the linker relaxes auipc+jalr to just jal. Unfortunately, objdump doesn't display the raw jal offset - 1015c is the absolute address after adding the offset to 10162.1

You can verify it by decoding the binary instruction in the second column by yourself:

   0xffbff0ef
=  0b11111111101111111111000011101111 | split into the offset parts
=>   1 1111111101 1 11111111          | i.e. off[20], off[10:1], off[11], off[19:12]
                                      | merge them into off[20:1]
=> 0b11111111111111111101             | left-shift by 1
=> 0b111111111111111111010            | sign-extend
=> 0b11111111111111111111111111111010
=  -6
=> 0x10162 - 6
=  0x1015c

Which matches the objdump output.


1 That means GNU binutils objdump doesn't display the raw jal offset. In contrast, llvm-objdump (LLVM 9 introduces official RISC-V support) does display the raw offset:

000000000001015e add_one:
   1015e: 41 11                         addi    sp, sp, -16
   10160: 06 e4                         sd  ra, 8(sp)
   10162: ef f0 bf ff                   jal -6
   10166: a2 60                         ld  ra, 8(sp)
   10168: 05 05                         addi    a0, a0, 1
   1016a: 41 01                         addi    sp, sp, 16
   1016c: 82 80                         ret

However, in contrast to GNU binutils objdump, llvm-objdump doesn't include the resulting absolute address as an annotation. Neither does it annotate the corresponding symbol. Thus, the GNU binutils objdump output arguably is more useful, in general.

Goodkin answered 16/2, 2020 at 12:59 Comment(2)
objdump -r puts relocation info inline as a comment without adding a separate section of output for the symbol table. I normally use alias disas='objdump -drwC'. Interesting, I didn't know ld could shift code like that. For x86, relaxing an indirect call is typically done by using a redundant prefix to maintain 6 byte instead of 5-byte instruction length. (Only 1 byte is not a big deal, and x86 is variable-length). Cool that RISC-V toolchains can do it without using a NOP.Davisdavison
@PeterCordes, actually I meant to write -r instead of -t. Fixed it now. Thanks for the note.Goodkin

© 2022 - 2024 — McMap. All rights reserved.