The term word size, or machine word, usually refers to the size of a register, and the size of a native load/store. The wikipedia article mentions some of the same stuff I wrote in this answer.
For a 64-bit system, a word could mean 8 bytes, but yes it's common for 64-bit RISC machines to use word = 32-bit. Most of them evolved out of 32-bit RISC ISAs, so it's natural to keep the same terminology and call 64-bits a double-word.
(Note that GDB uses its own notion of what a "word" is, separate from the ISA.)
But x86 evolved out of 16-bit 8086, where word
= 16-bit. When x86 was extended to have a 32-bit mode (i386), the simplest choice for everyone was to keep the same names for everything. An x86 dword is still 32 bits, an x86 word is still 16 bits. Even original 8086 + 8087 could load and store dword and qword integers, floats, and doubles, and instructions like cwd
(sign extend word to dword) existed in 8086 to set up for idiv
, so these terms were already in full use before 386 extended the register width to dword.
Also note that renaming everything would have been really confusing, because when 386 was new, most of them were still used in 16-bit mode to run DOS programs. Even modern x86-64 CPUs have full support for running in 16-bit real mode, so it would have been very confusing to have word
mean different things in different parts of Intel's manuals.
Byte is always an octet of 8 bits, except in some historical computer architectures. There were some with 9 bit bytes. The C standard still doesn't require CHAR_BIT = 8, so to write fully portable code, you can't assume that or 2's complement signed integers.
So in x86 documentation and asm mnemonics / syntax:
- B = Byte = 8bits (
PADDB
add packed 8bit ints in a vector)
- W = word = 16bits (
PADDW
add packed 16bit ints in a vector)
- D = long or dword (double-word) = 32bits (
PADDD
add packed 32bit ints in a vector)
- Q = quad-word = 64bits (
PADDQ
add packed 64bit ints in a vector)
- DQ = double-quad (also sometimes oct-word) = 128b (
movdqa
copy aligned 128b. PUNPCKLQDQ
: interleave the Low two 64bit Qwords of 128b src and dest into the DQ dest.)
AVX movdqa ymm0, [rdi]
is a 32B load, even though it still uses the same mnemonic. AVX is more like multiple 128b lanes than real native 256b vectors, so this kind of justifies it.
In NASM syntax, syntax like mov ax, word ptr [rdi]
is sometimes needed to specify the operand size, instead of inferring it from the dest register. AT&T syntax uses suffixes on mnemonics to specify operand size, if you don't want to leave it implicit and inferred from the choice of register: movw (%rdi), %ax
.
The B/W/D things in mnemonics predates vector extensions, in string-move instructions as one example. STOS
does *(rdi+=size) = al/ax/eax/rax. It can be written with an operand, like
STOS byte pointer [RDI]
to tell the assembler what operand size version to encode. But even in Intel / MASM / NASM syntax, you can also write STOSB / STOSW / STOSD / STOSQ
.
x86 is very much not a word-oriented architecture.
The whole concept of a "machine word" doesn't fit well for x86. 32-bit-only P5 Pentium CPUs have guaranteed-atomic loads/stores up to 64-bit. (e.g. with x87 or MMX), even though the integer register width is only 32-bit. (A 64-bit CAS requires lock cmpxchg8b
in 32-bit mode).
With x86-64, support for SSE2 is guaranteed, so we have 16-byte vector registers, and efficient support for basically every integer instruction with 8, 16, 32, or 64-bit operand-size. (With 32-bit operand-size being the default in x86-64 machine code (requiring no extra prefixes) so it's most efficient for code-size and sometimes also performance other than that, e.g. for div
or imul
on some CPUs.)
Also, unaligned loads and stores are fully efficient, not even an extra cache RMW cycle to commit unaligned or byte stores to L1d cache, as long as they don't cross a cache-line boundary. And the instruction format is a byte stream, not aligned words.
So it's not very meaningful to say that modern x86-64 has any specific "word size". The concept doesn't fit x86-64 as an ISA, and certainly not actual modern microarchitectures with their efficient unaligned loads/stores.