Why is default operand size 32 bits in 64 mode?

I am reading Intel doc, vol. 1 and There is a chapter of 3.6.1 Operand Size and Address Size in 64-Bit Mode. There are three prefixes REX.W, operand-size 66 and address-size 67 prefix. And there is mentioned that operand are defaulted to be 32 bit in size. And is only possible to change it with REX.W instruction prefix (after other prefixes) to make it 64 bits long.

I do not know why so, why cannot I used the full 64 bit space for example for int operand? Does it have something to do with sign? Or why is there this restriction? (so, does C unsigned int uses REX.W prefix with a operation on the int (as there is also mentioned, a prefix lasts only for a particular instruction, but not for the whole segment, which should be (the size, either address or operand's) default and contained in segment descriptor).

Do I understand it correctly?

TL:DR: you have 2 separate questions. 1 about C type sizes, and another about how x86-64 machine code encodes 32 vs. 64-bit operand-size. The encoding choice is fairly arbitrary and could have been made different. But int is 32-bit because that's what compiler devs chose, nothing to do with machine code.

int is 32-bit because that's still a useful size to use. It uses half the memory bandwidth / cache footprint of int64_t. Most C implementations for 64-bit ISAs have 32-bit int, including both mainstream ABIs for x86-64 (x86-64 System V and Windows). On Windows, even long is a 32-bit type, presumably for source compatibility with code written for 32-bit that made assumptions about type sizes.

Also, AMD's integer multiplier at the time was somewhat faster for 32-bit than 64-bit, and this was the case until Ryzen. (First-gen AMD64 silicon was AMD's K8 microarchitecture; see https://agner.org/optimize/ for instruction tables.)

The advantages of using 32bit registers/instructions in x86-64

x86-64 was designed by AMD in ~2000, as AMD64. Intel was committed to Itanium and not involved; all the design decisions for x86-64 were made by AMD architects.

AMD64 is designed with implicit zero-extension when writing a 32-bit register, so 32-bit operand-size can be used efficiently with none of the partial-register shenanigans you get with 8 and 16-bit mode.

TL:DR: There's good reason for CPUs to want to make 32-bit operand-size available somehow, and for C type systems to have an easily accessible 32-bit type. Using int for that is natural.

If you want 64-bit operand-size, use it. (And then describe it to a C compiler as long long or [u]int64_t, if you're writing C declarations for your asm globals or function prototypes). Nothing's stopping you (except for somewhat larger code size from needing REX prefixes where you might not have before).

All of that is a totally separate question from how x86-64 machine code encodes 32-bit operand-size.

AMD chose to make 32-bit the default and 64-bit operand-size require a REX prefix.

They could have gone the other way and made 64-bit operand-size the default, requiring REX.W=0 to set it to 32, or 0x66 operand-size to set it to 16. That might have led to smaller machine code for code that mostly manipulates things that have to be 64-bit anyway (usually pointers), if it didn't need r8..r15.

A REX prefix is also required to use r8..r15 at all (even as part of an addressing mode), so code that needs lots of registers often finds itself using a REX prefix on most instructions anyway, even when using the default operand-size.

A lot of code does use int for a lot of stuff, so 32-bit operand-size is not rare. And as noted above, it's sometimes faster. So it kind of makes sense to make the fastest instructions the most compact (if you avoid r8d..r15d).

It also maybe lets the decoder hardware be simpler if the same opcode decodes the same way with no prefixes in 32 and 64-bit mode. I think this was AMD's real motivation for this design choice. They certainly could have cleaned up a lot of x86 warts but chose not to, probably also to keep decoding more similar to 32-bit mode.

It might be interesting to see if you'd save overall code size for a version of x86-64 with a default operand-size of 64-bit. e.g. tweak a compiler and compile some existing codebases. You'd want to teach its optimizer to favour the legacy registers RAX..RDI for 64-bit operands instead of 32-bit, though, to try to minimize the number of instructions that need REX prefixes.

(Many instructions like add or imul reg,reg can safely be used at 64-bit operand-size even if you only care about the low 32, although the high garbage will affect the FLAGS result.)

Re: misinformation in comments: compat with 32-bit machine code has nothing to do with this. 64-bit mode is not binary compatible with existing 32-bit machine code; that's why x86-64 introduced a new mode. 64-bit kernels run 32-bit binaries in compat mode, where decoding works exactly like 32-bit protected mode.

https://en.wikipedia.org/wiki/X86-64#OPMODES has a useful table of modes, including long mode (and 64-bit vs. 32 and 16-bit compat modes) vs. legacy mode (if you boot a kernel that's not x86-64 aware).

In 64-bit mode some opcodes are different, and operand-size default to 64-bit for push/pop and other stack instruction opcodes.

32-bit machine code would decode incorrectly in that mode. e.g. 0x40 is inc eax in compat mode but a REX prefix in 64-bit mode. See x86-32 / x86-64 polyglot machine-code fragment that detects 64bit mode at run-time? for an example.

Also

64-bit mode decoding mostly similarly is a matter of sharing transistors in the decoders, not binary compatibility. Presumably it's easier for the decoders to only have 2 mode-dependent default operand sizes (16 or 32-bit) for opcodes like 03 add r, r/m, not 3. Only special-casing for opcodes like push/pop that warrant it. (Also note that REX.W=0 does not let you encode push r32; the operand-size stays at 64-bit.)

AMD's design decisions seem to have been focused on sharing decoder transistors as much as possible, perhaps in case AMD64 didn't catch on and they were stuck supporting it without people using it.

They could have done lots of subtle things that removed annoying legacy quirks of x86, for example made setcc a 32-bit operand-size instruction in 64-bit mode to avoid needing xor-zeroing first. Or CISC annoyances like flags staying unchanged after zero-count shifts (although AMD CPUs handle that more efficiently than Intel, so maybe they intentionally left that in.)

Or maybe they thought that subtle tweaks could hurt asm source porting, or in the short term make it harder to get compiler back-ends to support 64-bit code-gen.

Recommended topics

Hot tags