Quick summary: in x86-64 mode, are far jumps as slow as in x86-32 mode?
On the x86 processor, jumps fall into three types:
- short, with a PC-offset of +/-127 bytes (2-byte instruction)
- near, with a +/- 32k offset that "rolls around" the current segment (3-byte instruction)
- far, which can jump anywhere (5-byte instruction)
Short and near jumps take 1-2 clock cycles, while far jumps take 50-80 clock cycles, depending on processor. From my reading of the documentation, this is because they "go outside CS, the current code segment."
In x86-64 mode, code segments aren't used - The segment is effectively always 0..infinity. Ergo, there shouldn't be a penalty for going outside a segment.
Thus, the question: Does the number of clock cycles change for a far jump if the processor is in x86-64 mode?
Related bonus question: Most *nix-like operating systems running in 32-bit protected mode explicitly set the segment sizes to 0..infinity and manage the linear -> physical translation entirely through the page tables. Do they get a benefit from this in terms of the time for far calls (fewer clock cycles), or is the penalty really an internal CPU legacy from the size segment registers have been since the 8086?