I learned about one address, two address, and three address instruction, but now I'd like to know, what kind of address instruction does x86 use?
x86 is a CISC register machine, where at most 1 operand for any instruction can be an explicit memory address instead of a register, using an addressing mode like [rdi + rax*4]
. (There are instruction which can have 2 memory operands with one or both being implicit, though: What x86 instructions take two (or more) memory operands?)
Typical x86 integer instructions have 2 operands, both explicit, like add eax, edx
which does eax+=edx
.
And some truly 1-operand ALU instructions (no implicit other operand) like inc
/dec
, neg
, not
which are shortcuts for add/sub of implicit 1, or sub from 0, or XOR with -1 (some with different FLAGS semantics). And there's bswap
. Also the shift/rotate instructions with an implicit 1 count are basically 1-operand, and some assemblers do let you write shr %eax
.
Legacy x87 FP code uses 1-operand instructions with the x87 stack, like faddp st1
where the top of the x87 stack (st0
) is an implicit operand. And some 0-operand instructions like fchs
that operate only on st0
implicitly. (SSE2 is baseline for x86-64, so x87 is no longer widely used.)
Modern FP code uses SSE/SSE2 2-operand instructions like addsd xmm0,xmm1
or 3-operand AVX encodings like vaddsd xmm2, xmm0, xmm1
There are x86 instructions with 0, 1, 2, 3, and even 4 explicit operands.
There are multiple instruction formats, but explicit reg/memory operands are normally encoded in a ModR/M byte that follows the opcode byte(s). (x86-64 instruction encoding on osdev has good details and diagrams). It has 3 fields:
- 2-bit Mode for the r/m operand (register direct
reg
, register indirect[reg]
,[reg+disp8]
,[reg+disp32]
). The modes with displacement bits signal that those bytes follow the ModR/M byte. - 3-bit r/m field (the register number for register direct or indirect, or can be an escape code that means there's a Scale/Index/Base SIB byte after ModRM which can encode scaled-index addressing modes for the r/m operand). See rbp not allowed as SIB base? for the details of the special cases / escape codes.
- 3-bit reg field, always a register number. (Or in one-operand or
r/m, immediate
instructions, used as extra opcode bits, e.g. for shifts/rotates selects which kind.)
Most instructions are available in at least 2 encodings, reg/memory destination or reg/memory source. If the operands you want are both registers, you can use either opcode, either the add r/m32, r32
or add r32, r/m32
. (Some assemblers have syntax to let you select the non-default encoding. In theory an assembler / compiler could use these choices as a watermark to show which tool produced it.)
Common instructions also have other opcodes for immediate source forms, but typically they use the reg
field in ModR/M as extra opcode bits, so you still only get 2 operands like add eax, 123
. An exception to this is the immediate form of imul
added with 186, e.g. imul eax, [rdi + rbx*4], 12345
. Instead of sharing coding space with other immediate instructions, it has a register dst and a r/m source in ModR/M plus the immediate operand implied by the opcode.
Some one-operand instructions use the same trick of using the reg
field as extra opcode bits, but without an immediate. e.g. neg r/m32
, not r/m32
, inc r/m32
, or the shl
/shr
/rotate encodings that shift by an implicit 1 (not by cl
or an immediate). So unfortunately you can't copy-and-shift (until BMI2).
There are some special-case encodings to improve code density, like single-byte encodings for push rax
/push rdx
that pack the reg
field into the low 3 bits of the opcode byte. And in 16/32-bit mode, one-byte encodings for inc
/dec
any register. But in 64-bit mode those 0x4?
codes are used as REX prefixes to extend the reg
and r/m
fields to provide 16 architectural registers.
There are also instructions with some or all implicit operands, like movsb
which copies a byte from [rsi]
to [rdi]
, and can be used with a rep
prefix to repeat that rcx
times.
Or mul ecx
does edx:eax = eax * ecx
. One explicit source operand, one implicit source, and 2 implicit destination registers. div
/idiv
are similar.
Instructions with at least 1 explicit reg/mem operand use a ModR/M encoding for it, but instructions with zero explicit operands (like movsb
or cdq
) have no ModR/M byte. They just have the opcode. Some instructions have no operands at all, not even implicit, like mfence
.
Immediate operands can't be signalled through ModR/M, only by the opcode itself, so push imm32
or push imm8
have their own opcodes. The implicit destinations (memory at [rsp]
, and RSP itself being updated to rsp-=8
).
LEA is a workaround that gives x86 3-operand shift-and-add, like lea eax, [rdi + rdi*2 + 123]
to do eax = rdi*3 + 123
in one instruction. See Using LEA on values that aren't addresses / pointers? The destination register is encoded in ModR/M's reg
field, and the two source registers are encoded in the addressing mode. (Involving a SIB byte, the presence of which is signalled by the ModR/M byte using the encoding that would otherwise mean base = RSP).
VEX prefixes (introduced with AVX) provide 3-operand instructions like bzhi eax, [rsi], edx
or vaddps ymm0, ymm1, [rsi]
. (For many instructions, the 2nd source is the one that's optionally memory, but for some it's the first source.)
The 3rd operand is encoded in the 2 or 3-byte VEX prefix.
There are a few 3-operand non-VEX instructions, such as SSE4.1 variable blends like vpblendvb xmm1, xmm2/m128, <XMM0>
where XMM0 is an implicit operand using that register.
The AVX version makes it non-destructive (with a separate destination encoded in the VEX prefix), and makes the blend-control operand explicit (encoded in the high 4 bits of a 1-byte immediate). This gives us an instruction with 4 explicit operands, VPBLENDVB xmm1, xmm2, xmm3/m128, xmm4
.
x86 is pretty wild and has been extended many times, but typical integer code uses mostly 2-operand instructions, with a good amount of LEA thrown in to save instructions.
IMUL
was actually introduced with the 186, not 286 as you wrote. Also, you first list that the reg field of the ModR/M byte is "3-bit reg field, always a register number", then eventually you add that it can extend the opcode depending. I'd mention this in the list entry already. –
Halide 0F AF imul r, r/m
new in 386, not 186? Your ulukai.org/ecm/insref.htm and current nasm.us/doc/nasmdocb.html both say that. bitsavers.trailing-edge.com/components/intel/80186/… only mentions immediate imul, not the 2-operand form (strangely as a single-operand immediate, unlike another 186 manual). When you said 2-operand, were you counting the imul eax, 123
form where assemblers let you omit mentioning the first source if it's the same as the destination? –
Vociferous sub
. (And 2-operand version of not
/ neg
). intel.com/content/www/us/en/developer/articles/technical/… –
Vociferous © 2022 - 2024 — McMap. All rights reserved.