x86-64 and far calls / jumps

Quick summary: in x86-64 mode, are far jumps as slow as in x86-32 mode?

On the x86 processor, jumps fall into three types:

short, with a PC-offset of +/-127 bytes (2-byte instruction)
near, with a +/- 32k offset that "rolls around" the current segment (3-byte instruction)
far, which can jump anywhere (5-byte instruction)

Short and near jumps take 1-2 clock cycles, while far jumps take 50-80 clock cycles, depending on processor. From my reading of the documentation, this is because they "go outside CS, the current code segment."

In x86-64 mode, code segments aren't used - The segment is effectively always 0..infinity. Ergo, there shouldn't be a penalty for going outside a segment.

Thus, the question: Does the number of clock cycles change for a far jump if the processor is in x86-64 mode?

Related bonus question: Most *nix-like operating systems running in 32-bit protected mode explicitly set the segment sizes to 0..infinity and manage the linear -> physical translation entirely through the page tables. Do they get a benefit from this in terms of the time for far calls (fewer clock cycles), or is the penalty really an internal CPU legacy from the size segment registers have been since the 8086?

CS is used not only for base and limit, but also for permissions. The CPL is encoded there, as well as other fields such as:

D-bit - 32-bit or 16-bit default segment size
L-bit - selects compatibility or 64-bit mode for the segment (and in this case the base and limit are significant)

Far jumps can also go through a task gate, and far calls can also go through call gates. All of these have to be handled, regardless of 64-bit mode.

To sum up, a far jump in 64-bit mode is no faster than in 32-bit mode. In fact, considering that when 64-bit mode is enabled, segment descriptors are twice as large as when 64-bit mode is disabled, all descriptor-table accesses are doubled, which may lengthen the time of the jump.

Recommended topics

Hot tags