What is the definition of JAL in RISC-V and how does one use it?

Asked 28/10, 2018 at 21:59 Answered 29/10, 2018 at 9:45

assembly cpu-architecture riscv instructions

I don't get how JAL works in RISC-V as I've been seeing multiple conflicting definitions. For example, if I refer to this website: https://rv8.io/isa.html

It says that: JAL rd,offset has the 3rd argument as the offset, but there are some cases that shows JAL rd, imm instead. What is the difference?

It seems that JAL is supposed to take a function and return its output in rd (which I don't know why some sources has called it ra and rd at the same time). But if that's the case, what is the subroutine or the function? rd seems to be defined as the register destination, and imm seems to be just an integer..

Hemlock answered 28/10, 2018 at 21:59 Comment(2)

the risc-v spec is quite clear as to how that instruction works, a whole paragraph on it. Nothing vague or misleading there, please post that paragraph in your question for everyone to see and describe what part you dont understand. – Visby 28/10, 2018 at 22:4

rd is the link register, where jal stores the return address. Function and subroutine are basically interchangeable terms. You could argue that a subroutines are the subset of functions that don't have a return value, like void foo(int x) – Orangeman 28/10, 2018 at 22:4

In the jal instruction imm (or imm20) is a 20 bit binary number.

offset is the interpretation of imm by the jal instruction: the contents of imm are shifted left by 1 position and then sign-extended to the size of an address (32 or 64 bits, currently), thus making an integer with a value of -1 million (approximately) to +1 million.

This offset integer is added to the address of the jal instruction itself to get the address of the function you want to call. This new address is put into the PC and program execution resumes with whatever instruction is located at that address.

At the same time, the address of the instruction following the jal is stored into CPU register rd. The function being called will presumably later use this to return, using a jalr rn instruction.

The RISC-V hardware allows any of the 32 integer registers to be given as rd. If register 0 (x0) is given as rd then the return address is discarded and you effectively have a +/1 MB goto rather than a function call.

The standard RISC-V ABI (a software convention, nothing to do with hardware) specifies that for normal functions rd should be register 1 (x1), which is then commonly known as ra (Return Address). Register 5 (x5) is also commonly used for special runtime library functions, such as special functions to save and restore registers at the start and end of functions.

The RISC-V instruction set manual suggests that CPU designers might choose to add special hardware (a return address stack) to make strictly nested pairs of jal x1/x5,offset and `jalr x1/x5' run more quickly than would otherwise be expected, so there can be an advantage to following the standard ABI. However, the program will work correctly even if other registers are used.

Mocambique answered 29/10, 2018 at 9:45 Comment(0)

Seems like you're confusing machine code and assembly language.

From the machine code perspective, the whole instruction, and thus all its the fields are simply numbers:

some are fields encoding things like opcodes,
some are fields encoding things like register numbers, and,
some fields are numbers encoding things like signed or unsigned integers.

These encodings are defined by the instruction set architecture. The hardware is specifically designed to interpret these numeric bit fields according to the its ISA specification.

The assembler, linker, and operating system loader conspire to allow you to use symbolic values to form instructions instead of numbers for various fields (or even one number for the whole instruction):

opcode mnemonics for opcode numbers in opcode fields,
register names for register numbers in register fields, and,
labels — even complex expressions — for various immediate values in numeric operand fields.

I wouldn't take it too seriously that one text would refer to ra vs. rd. It could indicate a discrepancy, or, just be a different way of documenting the machine code fields of an instruction.

The JAL instruction encodes two operands: a register, and an immediate.

The identified register number is updated with the location of a return address, which is the location of the jal instruction — plus the length of the jal instruction, so that the register gets the value of the next sequential (by address) instruction after the jal, which is a proper return address from a call.

Like all the bit fields in an instruction, the immediate is an encoding — the decoded value ultimately yields the branch target address. It is computed by converting the immediate bit fields into a signed offset, and added to the pc of the jal instruction. The encoding allows for 18 bits (spread across several bit fields) plus a sign bit, and does not encode the last bit of the offset (branch targets are always 16-bit aligned, meaning the last bit would always be zero anyway so it is not stored). Ultimately the jal can reach -0.5MB to +0.5MB from the jal instruction itself.

As previously mentioned, the executing hardware converts the immediate (sub)field(s) into an offset that it adds to the pc to identify the final branch/call target. That we can provide labels and other complex expressions in assembly language is a feature of those languages, whose aim it to condense labels and other expressions into the immediate bit field constants needed by the processor to go where intended. There are complex interactions of relocation in the object code and/or fixups in the loaded code that ensure these immediate bit fields hold a useful bit pattern that the hardware can use based on otherwise relatively simple field extraction and addition at runtime to get where intended.

For functions to call each other without stepping on each others toes, the asm code for callers and callees must agree upon all of:

passed parameter(s)
passed return address
returned return value(s)
preserved registers
scratch registers

This is referred to broadly as the calling convention. It dictates how a caller knowing nothing else about the callee and vice versa, may interact. It imposes software convention requirements on which register or stack location holds the first parameter, second, etc.. on which register or stack location holds the return address, on how the return value is conveyed, and, what registers of the caller's environment are preserved by the call vs. potentially erased by the call (scratch).

When the conventions are properly observed, a caller (not knowing the implementation of the callee, only the types of parameters and return value, aka the function signature) can

safely store local variables in machine registers across a call,
pass parameters
invoke the callee,
return to the caller,
and then, receive return values and continue

Middlemost answered 28/10, 2018 at 23:10 Comment(1)

ra means return address, which is the mnemonic for the x1 register that is usually used as the operand. rd means register destination, which is the generic parameter name for the register that an instruction writes to. – Ephram 13/7, 2019 at 21:43

Recommended topics

Hot tags