How to interpret x86 opcode map?
Asked Answered
A

3

7

In looking at an x86 opcode map such as this:

http://www.mlsite.net/8086/#tbl_map1

It defines mappings, for example:

00: ADD Eb,Gb
01: ADD Ev,Gv
...

That link has basic descriptions of what the letters mean, such as:

  • E: A ModR/M byte follows the opcode and specifies the operand. The operand is either a general-purpose register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a displacement.
  • b: Byte argument.

But it's a bit too vague. How do you actually translate that into "complete opcode" (the whole instruction + args in opcode)? Haven't been able to figure it out from the Intel manuals yet either, maybe I'm looking in the wrong place (and it's a bit overwhelming)? Seeing a snippet showing the output opcode for an input instruction (and how you did that) would be super helpful.

Aaren answered 22/2, 2015 at 23:46 Comment(11)
Did you look at the "Disassembly by hand" link on that page?Loose
By all means, use the intel manuals. For each instruction it gives the machine code and chapter 2 has a very detailed description on the instruction format.Atavism
Not sure what you're asking; the intel reference manual does exactly that. The link you posted seems to over-complicate things by introducing terminology that nobody uses.Mirtamirth
The opcode map you're looking at is for the Intel 8086 processor. It is not accurate for modern x86 processors.Expansion
@duskwuff Are you sure? I thought x86 was totally retrocompatible,Therapsid
@Therapsid A map of opcodes for the 8086 will necessarily not include a large number of x86 instructions that have been added or changed since the 8086 was released in 1978.Expansion
@duskwuff I see, I misunderstood you. I think a better word would be incomplete.Therapsid
".. how you did that" -- I just checked the source of a home-grown disassembler. The piece of C code that disassembles the extra Mod/RM / SIB byte(s) runs to 467 lines. I'm not sure I could condense that into a short, relevant snippet (and without alluding to the several thousand lines of supporting code).Warthog
There is a significant difference between "plain 8086" and the more modern "real mode", which uses 32-bit registers by default. (I've heard rumors of a 64-bit mode as well. The mind boggles -- what will they come up next?)Warthog
@Warthog 'Rumours'? They've been shipping 64-bit processors for years,Schliemann
Also related: How to read the Intel Opcode notation and x86_64 Opcode encoding formats in the intel manualBettyannbettye
A
13

By all means, use the intel manuals. For each instruction it gives the machine code and chapter 2 has a very detailed description on the instruction format.

But to give you a walkthrough, let's see ADD EDX, [EBX+ECX*4+15h]. First we read through the chapters 2 INSTRUCTION FORMAT and 3.1 INTERPRETING THE INSTRUCTION REFERENCE PAGES to get an idea of what we will see. We are especially interested in the abbreviations listed at 3.1.1.3 Instruction Column in the Opcode Summary Table.

Armed with that information, we turn to the page describing the ADD instruction and try to identify an appropriate version for the one we want to encode. Our first operand is a 32 bit register and the second is a 32 bit memory location, so let's see what matches that. It's going to be the penultimate line: 03 /r ADD r32, r/m32. We go back to chapter 3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX prefix) to see what that magical /r is: Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand.

Okay, so Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format showed us how the instruction will look. So far we know that we won't have any prefixes and the opcode will be 03 and we will use at least a modr/m byte. So let's go see how to figure that out. Look at Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte. The columns represent the register operand, the rows the memory operand. Since our register is EDX we use the 3rd column.

The memory operand is [EBX+ECX*4+15h] which can be encoded using a 8 or a 32 bit displacement. To get shorter code we will use the 8 bit version, so the line [--][--]+disp8 applies. This means our modr/m byte is going to be 54.

We will need a SIB byte too. Those are listed in Table 2-3. 32-Bit Addressing Forms with the SIB Byte. Since our base is EBX we use column 4, and the row for [ECX*4] which gives us our SIB byte of 8B.

Finally we add our 8 bit displacement byte, which is 15. The complete instruction is thus 03 54 8B 15. We can verify this with an assembler:

2 00000000 03548B15                add edx, [ebx+ecx*4+15h]
Atavism answered 23/2, 2015 at 0:11 Comment(0)
A
2

You're looking at an opcode map that translates the first byte of an opcode in the instruction pattern that that byte matches. If you want to know about the rest of the bytes of the instruction, you need to look elsewhere.

If you look at the page for the ADD instruction, it will show you something like:

00 /r        ADD r/m8, r8

this tells you that the 00 byte is followed by a ModR/M byte that contains the register r in the register field and that register is an 8-bit register that is the second operand of the ADD instruction (the r8 in the instruction pattern), while the first operand is in the rest of the ModR/M byte

Now if you go look at the documentation for ModR/M bytes, it will tell you that a ModR/M byte has 3 fields -- a 2-bit 'mod' field, a 3-bit 'register/opcode' field and a 3-bit 'r/m' field. It then give a table of all 256 ModR/M byte values noting what the fields mean in each case. This table is (generally) organized as 32 rows of 8 columns -- the 32 rows are split into 4 groups of 8, with the groups corresponding to the 'mod' field bits and the rows within the groups to the 'r/m' field bits, while the columns correspond to the 'register/opcode' field bits. Its a litte weird as the 'mod' is the top 2 bits and the 'r/m' is the bottom 3 bits with the 'register/opcode' in the middle, but it makes sense as the 'mod' and 'r/m' bits are closely associated and go together to describe one operand, while the 'register/opcode' bits are pretty much completely independent, describing the other operand or being part of the opcode.

Angara answered 23/2, 2015 at 0:12 Comment(0)
S
0
Instruction Prefix                0 or 1 Byte
Address-Size Prefix               0 or 1 Byte
Operand-Size Prefix               0 or 1 Byte
Segment Prefix                    0 or 1 Byte
Opcode                            1 or 2 Byte
Mod R/M                           0 or 1 Byte
SIB, Scale Index Base (386+)      0 or 1 Byte
Displacement                      0, 1, 2 or 4 Byte (4 only 386+)
Immediate                         0, 1, 2 or 4 Byte (4 only 386+)

Format of Postbyte(Mod R/M from Intel-Doku)
------------------------------------------
MM RRR MMM

MM  - Memory addressing mode
RRR - Register operand address
MMM - Memory operand address

RRR Register Names
Filds  8bit  16bit  32bit
000    AL     AX     EAX
001    CL     CX     ECX
010    DL     DX     EDX
011    Bl     BX     EBX
100    AH     SP     ESP
101    CH     BP     EBP
110    DH     SI     ESI
111    BH     DI     EDI

---

16bit memory (No 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [BX+SI]   [BX+SI+o8]  [BX+SI+o16]
001   DS       [BX+DI]   [BX+DI+o8]  [BX+DI+o16]
010   SS       [BP+SI]   [BP+SI+o8]  [BP+SI+o16]
011   SS       [BP+DI]   [BP+DI+o8]  [BP+DI+o16]
100   DS       [SI]      [SI+o8]     [SI+o16]
101   DS       [DI]      [DI+o8]     [SI+o16]
110   SS       [o16]     [BP+o8]     [BP+o16]
111   DS       [BX]      [BX+o8]     [BX+o16]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

32bit memory (Has 67h 32 bit memory address prefix)
MMM   Default MM Field
Field Sreg     00        01          10             11=MMM is reg
000   DS       [EAX]     [EAX+o8]    [EAX+o32]
001   DS       [ECX]     [ECX+o8]    [ECX+o32]
010   DS       [EDX]     [EDX+o8]    [EDX+o32]
011   DS       [EBX]     [EBX+o8]    [EBX+o32]
100   SIB      [SIB]     [SIB+o8]    [SIB+o32]
101   SS       [o32]     [EBP+o8]    [EBP+o32]
110   DS       [ESI]     [ESI+o8]    [ESI+o32]
111   DS       [EDI]     [EDI+o8]    [EDI+o32]
Note: MMM=110,MM=0 Default Sreg is DS !!!!

---

SIB is (Scale/Base/Index)
SS BBB III
Note: SIB address calculated as:
<sib address>=<Base>+<Index>*(2^(Scale))

Fild   Default Base
BBB    Sreg    Register   Note
000    DS      EAX
001    DS      ECX
010    DS      EDX
011    DS      EBX
100    SS      ESP
101    DS      o32        if MM=00 (Postbyte)
SS      EBP        if MM<>00 (Postbyte)
110    SS      ESI
111    DS      EDI

Fild  Index
III   register   Note
000   EAX
001   ECX
010   EDX
011   EBX
100              never Index SS can be 00
101   EBP
110   ESI
111   EDI

Fild Scale coefficient
SS   =2^(SS)
00   1
01   2
10   4
11   8
Slipcase answered 23/2, 2015 at 8:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.