What kind of address instruction does the x86 cpu have?
Asked Answered
S

1

3

I learned about one address, two address, and three address instruction, but now I'd like to know, what kind of address instruction does x86 use?

Satem answered 15/11, 2018 at 17:51 Comment(2)
By "address", do you mean "operand"?Sociability
@Sneftel: yes, in abstract ISA-classification terminology, it means operand. like the 5-bit register fields in a MIPS instruction word are "addresses". (I don't know if geeksforgeeks.org/… is any good, but that's the terminology they use)Vociferous
V
7

x86 is a CISC register machine, where at most 1 operand for any instruction can be an explicit memory address instead of a register, using an addressing mode like [rdi + rax*4]. (There are instruction which can have 2 memory operands with one or both being implicit, though: What x86 instructions take two (or more) memory operands?)

Typical x86 integer instructions have 2 operands, both explicit, like add eax, edx which does eax+=edx.

And some truly 1-operand ALU instructions (no implicit other operand) like inc/dec, neg, not which are shortcuts for add/sub of implicit 1, or sub from 0, or XOR with -1 (some with different FLAGS semantics). And there's bswap. Also the shift/rotate instructions with an implicit 1 count are basically 1-operand, and some assemblers do let you write shr %eax.

Legacy x87 FP code uses 1-operand instructions with the x87 stack, like faddp st1 where the top of the x87 stack (st0) is an implicit operand. And some 0-operand instructions like fchs that operate only on st0 implicitly. (SSE2 is baseline for x86-64, so x87 is no longer widely used.)

Modern FP code uses SSE/SSE2 2-operand instructions like addsd xmm0,xmm1 or 3-operand AVX encodings like vaddsd xmm2, xmm0, xmm1

There are x86 instructions with 0, 1, 2, 3, and even 4 explicit operands.

There are multiple instruction formats, but explicit reg/memory operands are normally encoded in a ModR/M byte that follows the opcode byte(s). (x86-64 instruction encoding on osdev has good details and diagrams). It has 3 fields:

  • 2-bit Mode for the r/m operand (register direct reg, register indirect [reg], [reg+disp8], [reg+disp32]). The modes with displacement bits signal that those bytes follow the ModR/M byte.
  • 3-bit r/m field (the register number for register direct or indirect, or can be an escape code that means there's a Scale/Index/Base SIB byte after ModRM which can encode scaled-index addressing modes for the r/m operand). See rbp not allowed as SIB base? for the details of the special cases / escape codes.
  • 3-bit reg field, always a register number. (Or in one-operand or r/m, immediate instructions, used as extra opcode bits, e.g. for shifts/rotates selects which kind.)

Most instructions are available in at least 2 encodings, reg/memory destination or reg/memory source. If the operands you want are both registers, you can use either opcode, either the add r/m32, r32 or add r32, r/m32. (Some assemblers have syntax to let you select the non-default encoding. In theory an assembler / compiler could use these choices as a watermark to show which tool produced it.)

Common instructions also have other opcodes for immediate source forms, but typically they use the reg field in ModR/M as extra opcode bits, so you still only get 2 operands like add eax, 123. An exception to this is the immediate form of imul added with 186, e.g. imul eax, [rdi + rbx*4], 12345. Instead of sharing coding space with other immediate instructions, it has a register dst and a r/m source in ModR/M plus the immediate operand implied by the opcode.

Some one-operand instructions use the same trick of using the reg field as extra opcode bits, but without an immediate. e.g. neg r/m32, not r/m32, inc r/m32, or the shl/shr/rotate encodings that shift by an implicit 1 (not by cl or an immediate). So unfortunately you can't copy-and-shift (until BMI2).

There are some special-case encodings to improve code density, like single-byte encodings for push rax/push rdx that pack the reg field into the low 3 bits of the opcode byte. And in 16/32-bit mode, one-byte encodings for inc/dec any register. But in 64-bit mode those 0x4? codes are used as REX prefixes to extend the reg and r/m fields to provide 16 architectural registers.


There are also instructions with some or all implicit operands, like movsb which copies a byte from [rsi] to [rdi], and can be used with a rep prefix to repeat that rcx times.

Or mul ecx does edx:eax = eax * ecx. One explicit source operand, one implicit source, and 2 implicit destination registers. div/idiv are similar.

Instructions with at least 1 explicit reg/mem operand use a ModR/M encoding for it, but instructions with zero explicit operands (like movsb or cdq) have no ModR/M byte. They just have the opcode. Some instructions have no operands at all, not even implicit, like mfence.

Immediate operands can't be signalled through ModR/M, only by the opcode itself, so push imm32 or push imm8 have their own opcodes. The implicit destinations (memory at [rsp], and RSP itself being updated to rsp-=8).


LEA is a workaround that gives x86 3-operand shift-and-add, like lea eax, [rdi + rdi*2 + 123] to do eax = rdi*3 + 123 in one instruction. See Using LEA on values that aren't addresses / pointers? The destination register is encoded in ModR/M's reg field, and the two source registers are encoded in the addressing mode. (Involving a SIB byte, the presence of which is signalled by the ModR/M byte using the encoding that would otherwise mean base = RSP).


VEX prefixes (introduced with AVX) provide 3-operand instructions like bzhi eax, [rsi], edx or vaddps ymm0, ymm1, [rsi]. (For many instructions, the 2nd source is the one that's optionally memory, but for some it's the first source.)

The 3rd operand is encoded in the 2 or 3-byte VEX prefix.


There are a few 3-operand non-VEX instructions, such as SSE4.1 variable blends like vpblendvb xmm1, xmm2/m128, <XMM0> where XMM0 is an implicit operand using that register.

The AVX version makes it non-destructive (with a separate destination encoded in the VEX prefix), and makes the blend-control operand explicit (encoded in the high 4 bits of a 1-byte immediate). This gives us an instruction with 4 explicit operands, VPBLENDVB xmm1, xmm2, xmm3/m128, xmm4.


x86 is pretty wild and has been extended many times, but typical integer code uses mostly 2-operand instructions, with a good amount of LEA thrown in to save instructions.

Vociferous answered 15/11, 2018 at 18:40 Comment(6)
The two and three operand IMUL was actually introduced with the 186, not 286 as you wrote. Also, you first list that the reg field of the ModR/M byte is "3-bit reg field, always a register number", then eventually you add that it can extend the opcode depending. I'd mention this in the list entry already.Halide
@ecm: Good suggestion about /r, thanks. And yeah, I told you last time this came up that I probably had several answers that included the old NASM appendix's wrong info about when imul was new. I'll see if I can search up other cases.Vociferous
@ecm: Wasn't 2-operand 0F AF imul r, r/m new in 386, not 186? Your ulukai.org/ecm/insref.htm and current nasm.us/doc/nasmdocb.html both say that. bitsavers.trailing-edge.com/components/intel/80186/… only mentions immediate imul, not the 2-operand form (strangely as a single-operand immediate, unlike another 186 manual). When you said 2-operand, were you counting the imul eax, 123 form where assemblers let you omit mentioning the first source if it's the same as the destination?Vociferous
Yes, I was referring to the short form of the three-operand instruction with the destination and one source the same register, and the last operand being an immediate. However, I should have been clearer in that; I did actually miss the two-operand form without an immediate operand. You're right that that one is a 386+ instruction.Halide
@ecm: Ok, good. As you know, in machine code imul-immediate always has 3 operands, it's just a source-level shorthand. Since there is a 2-operand form which is a different instruction, IMO it's best to just talk about the number of real machine-code operands, regardless of how you write it in the source, at least when talking about which forms exist and were introduced when. Because that's a machine-code issue. And BTW, I finally got around to searching and editing my answers that mention "imul" and "286". About a dozen of them so far.Vociferous
Update: APX will provide EVEX 3-operand encodings of classic integer instructions like sub. (And 2-operand version of not / neg). intel.com/content/www/us/en/developer/articles/technical/…Vociferous

© 2022 - 2024 — McMap. All rights reserved.