x86-64 and far calls / jumps
Asked Answered
W

1

6

Quick summary: in x86-64 mode, are far jumps as slow as in x86-32 mode?

On the x86 processor, jumps fall into three types:

  • short, with a PC-offset of +/-127 bytes (2-byte instruction)
  • near, with a +/- 32k offset that "rolls around" the current segment (3-byte instruction)
  • far, which can jump anywhere (5-byte instruction)

Short and near jumps take 1-2 clock cycles, while far jumps take 50-80 clock cycles, depending on processor. From my reading of the documentation, this is because they "go outside CS, the current code segment."

In x86-64 mode, code segments aren't used - The segment is effectively always 0..infinity. Ergo, there shouldn't be a penalty for going outside a segment.

Thus, the question: Does the number of clock cycles change for a far jump if the processor is in x86-64 mode?

Related bonus question: Most *nix-like operating systems running in 32-bit protected mode explicitly set the segment sizes to 0..infinity and manage the linear -> physical translation entirely through the page tables. Do they get a benefit from this in terms of the time for far calls (fewer clock cycles), or is the penalty really an internal CPU legacy from the size segment registers have been since the 8086?

Woodworm answered 2/7, 2010 at 17:49 Comment(2)
Your comment about *nix systems also applies to Windows, AFAIK. I don't think there's any modern operating system that actually use code segments.Aquarium
A "near" jump in 32-bit or 64-bit mode has a 32-bit offset, not a 16-bit offset (5-byte instruction). "far" jumps involving task gates (the only really slow ones) are invalid in 64-bit mode. far absolute jumps are about the same speed as near jumps (fast if they hit in BTB and cache, overall dominated by cache effects)Daybreak
S
2

CS is used not only for base and limit, but also for permissions. The CPL is encoded there, as well as other fields such as:

  • D-bit - 32-bit or 16-bit default segment size
  • L-bit - selects compatibility or 64-bit mode for the segment (and in this case the base and limit are significant)

Far jumps can also go through a task gate, and far calls can also go through call gates. All of these have to be handled, regardless of 64-bit mode.

To sum up, a far jump in 64-bit mode is no faster than in 32-bit mode. In fact, considering that when 64-bit mode is enabled, segment descriptors are twice as large as when 64-bit mode is disabled, all descriptor-table accesses are doubled, which may lengthen the time of the jump.

Stibine answered 3/7, 2010 at 7:36 Comment(1)
As far as I know only some type of segment descriptors are extended from 8-byte to 16-byte inside the GDT table (e.g TSS segment descriptor).Dereliction

© 2022 - 2024 — McMap. All rights reserved.