You are vastly over-estimating the cost in CPU complexity of decoding a relative jump.
- calculating the displacement(i.e the distance from the jump label to the next instruction after the jump)
- then taking that displacements 2's compliment,
The machine code has to contain the result of step 2 (a signed integer relative displacement), so all of that is done at assemble time. And in the assembler, subtracting two integer addresses already gives you the signed 2's complement displacement you need.
There are real advantages to using relative displacements, so making the ISA worse just to simplify writing an assembler would not have made any sense. You only need to write the assembler once, but everything that runs on the machine benefits from more compact code, and position independence.
Relative branch displacements are completely normal, and used in most other architectures, too (e.g. ARM: https://community.arm.com/processors/b/blog/posts/branch-and-call-sequences-explained, where fixed-width instructions makes a direct absolute branch encoding impossible anyway). It would have made 8086 the odd one out to not use relative branch encoding.
update: Maybe not totally the odd one out. MIPS uses rel16 << 2
for beq
/ bne
(MIPS instructions are fixed at 32-bits wide and always aligned). But for unconditional j
(jump) instructions, it interestingly it uses a pseudo-direct encoding. It keeps the high 4 bits of PC, and directly replaces the PC[27:2]
bits with the value encoded in the instruction. (Again, low 2 bits of the program counter are always 0
.) So within the same 1/16th of address space, j
instructions are direct jumps, and don't give you position-independent code. This applies to jal
(jump-and-link = call
), making function calls from PIC code less efficient :( Linux-MIPS used to require PIC binaries, but apparently now it doesn't (but shared libs still have to be PIC).
When the CPU runs eb fe
, all it has to do is add the displacement to IP
instead of replacing IP
. Since non-jump instructions already update IP
by adding the instruction length, the adder hardware already exists.
Note that sign-extending 8-bit displacements to 16-bit (or 32 or 64-bit) is trivial in hardware: 2's complement sign-extension is just copying the sign bit, which doesn't require any logic gates, just wires to connect one bit to the rest. (e.g. 0xfe
becomes 0xfffe
, while 0x05
becomes 0x0005
.)
8086 put a big emphasis on code density, providing short forms of many common instructions. This makes sense, because code-fetch was one of the most important bottlenecks on 8086, so smaller code usually was faster code.
For example, two forms of relative jmp
existed, one with rel8 (short) and one with rel16 (near). (In 32 and 64-bit mode introduced in later CPUs, the E9
opcode is a jmp rel32
instead of rel16
, but EB
is still jmp rel8
because jumps within a function are often within -128/+127).
But there's no special short for for call
, because it wouldn't be much use most of the time. So why does it still bother with a relative displacement instead of absolute?
Well x86 does have absolute jumps, but only for indirect or far jumps. (To a different code segment). For example, the EA
opcode is jmp ptr16:16
: "Jump far, absolute, address given in operand".
To do an absolute near jump, simply mov ax, target_label
/ jmp ax
. (Or in MASM syntax, mov ax, OFFSET target_label
).
Relative displacements are position-independent
Comments on the question brought this up.
Consider a block of machine code (already assembled), with some jumps inside the block. If you copy that whole block to a different start address (or change the CS
base address so the same block is accessible at a different offset with the segment), then only relative jumps will keep working.
For labels + absolute addresses to solve the same problem, the code would have to be re-assembled with a different ORG
directive. Obviously that can't happen on the fly when you change CS with a far jmp!
infiniteLoop: jmp infiniteLoop
only as machine codeeb fe
, it doesn't search for labelinfiniteLoop
and compute it is -2 bytes away during each execution. That's the work of assembler, which is producing the machine code. So the CPU just doesip = ip + sign-extended(immediate)
, almost the same amount of work asip = absolute_address
. The addition was even back in 197x reasonably cheap operation, fetching the new opcodes from memory took longer than that. With modern x86 the addition is almost free, but keeping all that cache machinery up to date makesjmp
complex. – Footton0xFE
is-2
always, wherever you relocate that piece of code. While absolute address encoded in instruction would need patching with each relocation of code to point to the correct absolute address. And modern executables don't know the address where they will be loaded by OS. So they have relocation table, the OS loads binary from disk into memory, and then goes through the relocation table, and patches all instruction opcodes to have correct absolute addresses. A PIC variant of executable does use only relative addressing, so OS will just load it to random address and execute it. – Footton