How does an instruction decoder tell the difference between a prefix and a primary opcode?

Asked 23/8, 2021 at 20:48 Answered 23/8, 2021 at 21:42

Solved assembly x86 cpu-architecture machine-code instruction-encoding

I'm trying to wrap my head around the x86 instruction encoding format. All the sources that I read still make the subject confusing. I'm starting to understand it a little bit but one thing that I'm having trouble with understanding is how the CPU instruction decoder differentiates an opcode prefix from an opcode.

I'm aware that the whole format of the instruction basically depends on the opcode (with extra bit fields defined in the opcode of course). Sometimes the instruction doesn't have a prefix and the opcode is the first byte. How would the decoder know?

I'm assuming that the instruction decoder would be able to tell the difference because opcode bytes and prefix bytes would not share the same binary values. So the decoder can tell if the unique binary number in the byte is an instruction or a prefix. For example (In this example we will stick to single byte opcodes) a REX or LOCK prefix would not share the same byte value as any opcode in the architecture's instruction set.

Dygall answered 23/8, 2021 at 20:48 Comment(10)

I have to admit I'm really hoping there's something more clever than a list of which byte values are prefixes. – Nucleate 23/8, 2021 at 20:51

Traditionally, prefix bytes are different from opcode bytes, so a state machine can just remember which prefixes it's seen until it gets to an opcode byte. x86 machine code is a byte stream that's not self-synchronizing (e.g. a ModRM or an immediate can be any byte). Multi-byte VEX and EVEX prefixes aren't that simple, overlapping with invalid encodings of LES and LDS (outside of 64-bit mode) for example. – Fauve 23/8, 2021 at 21:35

The decoder is a state machine. It knows where the instruction starts, the prefixes' values, and that prefixes are optional and that they come first. One-byte opcodes don't overlap with prefixes, two and three bytes opcodes start with 0f which is not a prefix. So the decoder can tell when the opcode starts. Alas, Intel reused the prefixes to change meaning to opcodes. The decoder takes that into account. E.g. 0f 58 is the opcode for all add{ps,pd,ss,sd}, specifically f2 0f 58 is addsd while 66 0f 58 is addpd. Curiously, f2 66 0f 58 is o16 addsd and not repne addpd. – Motel 23/8, 2021 at 21:36

The decoder has an undocumented (but easy to reverse) algorithm when picking up these prefixes. This shows that some opcode listed in the manual is actually a prefix+opcode. N.B. Is syntactically valid to add an unneeded prefix to an opcode, but it's a reserved use. – Motel 23/8, 2021 at 21:36

@MargaretBloom Nowadays it's more useful to think of f2, f3, and 66 as giving additional opcode bits as opposed to being real prefixes. For SIMD instructions, there are two such opcode bits (encoding at most one of f2/f3/66) but for scalar instructions, 66 can be combined with f2 and f3. There are even instructions combining a 66 prefix with REX.W. – Tricia 23/8, 2021 at 22:52

@Tricia Yes, they are used for giving extra bits to the opcode but they are still prefixes. Otherwise f2 66 0f 58 would be an undefined instruction and not addsd (which opcode is ufficially f2 0f 58). The fact that the 66 byte could be reordered and ignored means that they are still treated as prefixes. I see what you mean though, and I agree it's better to think of them as encoding additional bits. After all, it's only going to get worse :D – Motel 24/8, 2021 at 8:38

@MargaretBloom That seems like implementation defined behavior as 66 0f 58 /r is addpd and with both prefixes present, it could also reasonably be decoded as that. Yes, of course they are prefixes, but they should be seen as providing additional opcode bits (as opposed to modifying the instruction in a systematic manner as e.g. segment prefixes do). – Tricia 24/8, 2021 at 8:57

@Tricia That's true, I agree :) – Motel 24/8, 2021 at 13:10

Can you guys send me a link to a place where I can understand x86 instruction format sanely without my head exploding. This topic was way more complex then I first thought for a newbie such as myself. I'd figure that I'd learn the instruction encoding format before I get into assembly and I haven't much assembly. I'm at a roadblock and I just need help finding the right direction. Thankyou everyone for taking your time out of the day for this question. I will still try to comprehend your answers. – Dygall 24/8, 2021 at 13:32

@DanielCatalano Refer to the Intel Software Development Manuals for the full story. I've given an overview before. The x86 instruction encoding is widely regarded as very complicated, though I don't share that opinion. Consider learning a bit assembly before coming back to the instruction encoding. – Tricia 24/8, 2021 at 13:38

Traditional (single-byte) prefixes are different from opcode bytes like you said, so a state machine can just remember which prefixes it's seen until it gets to an opcode byte.

The 0f escape byte for 2-byte opcodes is not really a prefix. It has to be contiguous with the 2nd opcode byte. Thus, following a 0f, any byte is an opcode, even if it's something like f2 that would otherwise be a prefix. (This also applies following 0f 3a or 0f 38 2-byte escapes for SSSE3 and later, or VEX/EVEX prefixes that encode one of those escape sequences).

If you look at an opcode map, there are no entries that are ambiguous between single-byte prefix and opcode. (e.g. http://ref.x86asm.net/coder64.html, and notice how the 2-byte 0F .. opcodes are listed separately).

The decoders do have to know the current mode for this (and other things); for example x86-64 removed the 1-byte inc/dec reg opcodes for use as REX prefixes. (x86 32 bit opcodes that differ in x86-x64 or entirely removed). We can even use this difference to write polyglot machine code that runs differently when decoded in 32-bit vs. 64-bit mode, or even distinguish all 3 mode sizes.

x86 machine code is a byte stream that's not self-synchronizing (e.g. a ModRM or an immediate can be any byte). The CPU always knows where to start decoding from, either a jump target or the byte after the end of a previous instruction. That's the start of the instruction (including prefixes).

Bytes in memory are just bytes, only becoming instructions when they're decoded by the CPU. (Although in normal programs, simply disassembling from the top of the .text section does give you the program's instructions. Self-modifying and obfuscated code are not normal.)

AVX / AVX-512: multi-byte prefixes that overlap with opcodes

Multi-byte VEX and EVEX prefixes aren't that simple in 32-bit mode. For example VEX prefixes overlap with invalid encodings of LES and LDS in modes other than 64-bit. (The c4 and c5 opcodes for LES and LDS are always invalid in 64-bit mode, except as VEX prefixes.) https://wiki.osdev.org/X86-64_Instruction_Encoding#VEX.2FXOP_opcodes

In legacy / compat modes, there weren't any free bytes left that weren't already opcodes or prefixes when AVX (VEX prefixes) and AVX-512 (EVEX prefix), so the only room for extensions was as encodings for opcodes that are only valid with a limited set of ModRM bytes. (e.g. LES / LDS require a memory source, not register - this is why some bits are inverted in VEX prefixes, so the top 2 bits of the byte after c4 or c5 will always be 1 in 32-bit mode instead of 0. That's the "mode" field in ModRM, and 11 means register).

(Fun fact: VEX prefixes are not recognized in 16-bit real mode, apparently because some software used the same invalid encodings of LES / LDS as intentional traps, to be sorted out in the #UD exception handler. VEX prefixes are recognized in 16-bit protected mode, though.)

AMD64 freed up several bytes by removing instructions like AAM, as well as LES/LDS (and the one-byte inc/dec reg encodings for use as REX prefixes), but CPU vendors have continued to care about 32-bit mode and not added any extensions that are only available in 64-bit mode which could simply take advantage of those free opcode bytes. This means finding ways to cram new instruction encodings into increasingly small gaps in 32-bit machine code. (Often via mandatory prefixes, e.g. rep bsr = lzcnt on CPUs with that feature, which gives different results.)

So the decoders in modern CPUs that support AVX / BMI1/2 have to look at multiple bytes to decide whether this is a prefix for a valid AVX or other VEX-encoded instruction, or in 32-bit mode if it should decode as LES or LDS. (And I guess look at the rest of the instruction to decide if it should #UD).

But modern CPUs are looking at 16 or 32 bytes at a time anyway to find instruction boundaries in parallel. (And then later feed those groups of instruction bytes to actual decoders, again in parallel.) https://www.realworldtech.com/sandy-bridge/4/

Same goes for the prefix scheme used by AMD XOP, which is a lot like VEX.

Agner Fog's blog article Stop the instruction set war from 2009 (soon after AVX was announced, before the first hardware supporting it) has a table of remaining unused coding space for future extensions, and some notes about it being "assigned" to AMD, Intel, or Via.

Related / examples

How to tell the length of an x86 instruction? (including my answer) has some more details about x86 machine code.
https://codegolf.stackexchange.com/questions/133486/find-an-illegal-string/133622#133622 (on codegolf.SE - the shortest sequence of bytes that will definitely #UD fault if it's not jumped over. It has to be long enough that it can't be consumed by the CPU as the immediate for a mov r64, imm64 for example.)
Why does x/i on gdb give different results then disassemble? - an example of starting decode in the wrong place and decoding the middle of another instruction as something else.

Machine code tricks: decoding the same byte multiple ways

(This is not really related to prefixes, but in general seeing how the rules apply to weird cases can help understand exactly things work.)

A software disassembler does need to know a start point. This can be problematic if obfuscated code mixes code and data, and actual execution jumps to places you wouldn't get if you just assume that you can decode in order without following jumps.

Fortunately compiler-generated code doesn't do that so naive static disassembly (e.g. by objdump -d or ndisasm, as opposed to IDA) finds the same instruction boundaries that actually running the program will.

This is not a problem for running obfuscated machine code; the CPU just does what it's told, and never cares about bytes before the place you tell it to jump to. Disassembling without running / single-stepping the program is the hard thing, especially with the possibility of self-modifying code and jumps to what a naive disassembler would think was the middle of an earlier instruction.

Obfuscated machine code can even have an instruction decode one way, then jump back into what was the middle of that instruction, for a later byte to be the opcode (or prefix + opcode). Modern CPUs with uop caches or that mark instruction boundaries in I-cache run slow (but correctly) if you do this, so it's more of a fun code-golf trick (extreme code-size optimization at the expense of speed) or obfuscation technique.

For an example of this, see my codegolf.SE x86 machine code answer to Golf a Custom Fibonacci Sequence. I'll excerpt the disassembly that lines up with what the CPU sees after looping back to cfib.loop, but note that the first iteration decodes differently. So I'm using just 1 byte outside the loop instead of 2 to effectively jump into the middle for the start of the first iteration. See the linked answer for a full description and the other disassembly.

0000000000401070 <cfib>:
  401070:       eb                      .byte 0xeb      # jmp rel8 consuming the 01 add opcode as a rel8
0000000000401071 <cfib.loop>:
  401071:       01 d0                   add    eax,edx
# loop entry point on first iteration, jumping over the ModRM byte (D0) of the ADD
    (entry on first iteration):
  401073:       92                      xchg   edx,eax
  401074:       e2 fb                   loop   401071 <cfib.loop>
  401076:       c3                      ret

You can do this with opcodes that consume more later bytes, like 3D <dword> cmp eax, imm32. When the CPU sees a 3D opcode byte, it will grab the next 4 bytes as the immediate. If you later jump into those 4 bytes, they'll be considered as prefix/opcodes and everything will work (except for performance problems) the same regardless of how those bytes had previously been decoded as a different part of an instruction. The CPU has to maintain the illusion of decoding and executing 1 instruction at a time, other than performance.

I learned of this trick from @Ira Baxter's answer on Can assembled ASM code result in more than a single possible way (except for offset values)?

Fauve answered 23/8, 2021 at 21:42 Comment(15)

I'm just going to try to summarize your answer to make sure that I understand. So your basically saying in the first paragraph that prefix bytes and opcode bytes all have unique values so that's how the decoder can tell the difference(like I said in the question) between them. – Dygall 24/8, 2021 at 13:38

@DanielCatalano That's correct for legacy prefixes. However, VEX and EVEX prefixes do share their encodings with other instructions and must be told apart by looking a the second byte of the instruction. – Tricia 24/8, 2021 at 13:41

This should of been apart of the last comment but the website was under maintenance and crashed on me! The "0F" byte is apart of the opcode and just says to the decoder that the next byte is also an opcode byte. For example, "00" opcode is "ADD" and "0f 00" makes the opcode "SLDT". So the "0f" byte basically is like an extra most significant binary digit in that it doubles the amount of available opcodes. There will be more comments due to character limit – Dygall 24/8, 2021 at 14:15

@DanielCatalano: yes, my first paragraph mentioned the 0f escape byte. I just finished an edit to add some stuff, including a link to an opcode map: ref.x86asm.net/coder64.html – Fauve 24/8, 2021 at 14:20

In the second half of the third paragraph in simple terms it says that you must start execution at the start of the instruction (and cant just start anywhere in the instruction. Like the SIB byte for example). Compiled code eliminates this issue. Ok – Dygall 24/8, 2021 at 14:37

@Tricia As in legacy prefixes, do you mean prefixes found in 8086 16 bit? Or do you mean prefixes found on the 80386 32 bit? – Dygall 24/8, 2021 at 14:42

@DanielCatalano: It's not that "compiled code eliminates the issue", it's that anywhere you jump to is assumed to be the start of the instruction. The same applies in hand-written code. The same sequence of bytes can decode differently if you jump to different start points within it. e.g. jump to a+0 might produce one sequence of instructions, jump to a+1 might produce a different sequence, using different bytes as opcodes vs. modrm and immediate. The x86 ISA says that decoding starts from a byte and goes forward. e.g. single-core code might jump over a lock prefix... – Fauve 24/8, 2021 at 14:43

@DanielCatalano: "legacy" prefixes are everything up to and including REX prefixes, i.e. single-byte prefixes that don't share a byte with opcodes. VEX, EVEX (and XOP) are the exceptions to this. – Fauve 24/8, 2021 at 14:44

@DanielCatalano Legacy prefixes are the segment prefixes, the REP prefixes (f2 and f3), the address and operand size override prefixes (66 and 67), the LOCK prefix (f0), as well as the REX family of prefixes (40 to 4f). – Tricia 24/8, 2021 at 14:45

@Tricia I have taken a look at the chart link and searched for the prefix "REPNZ/REPNE" with a hexadecimal value of "f2". Apparently this "f2" is an opcode as well ("PSLLD"). This is the case only when the value is "0f f2". "f2" is just the byte for the prefix and "0f f2" is the byte/**byte combo** for the opcode. This is similar to what I said earlier about how the "0f" acts as a most significant digit to the opcode. This must be how opcode "00" is completely different from "0f 00"! – Dygall 24/8, 2021 at 15:26

@DanielCatalano Yes, 0f is not a prefix. Rather, it introduces a two-byte opcode. Prefixes are only recgonised before opcodes, not after them (for obvious reasons). There are also three byte opcodes beginning with 0f 38 and 0f 3a. Note that with VEX and EVEX prefixes, the 0f, 0f 38, and 0f 3a “significant digits” are rolled into the VEX/EVEX prefix. This is how AVX instructions are usually not longer than their SSE counterparts despite having a mandatory two-byte VEX prefix. – Tricia 24/8, 2021 at 15:33

@DanielCatalano: Yes, good point, f2 is an opcode when preceded by a 0f escape byte or sequence. Updated my answer to cover that case; it's important to point out that difference between prefix and escape byte. – Fauve 24/8, 2021 at 16:10

@DanielCatalano: Updated again with some interesting corner case example of how instruction decoding works. Not really prefixes specifically, but should drive home the point that wherever you jump to is the start of the instruction, and the CPU decodes from there. – Fauve 24/8, 2021 at 17:40

I have other questions regarding x86 encoding which wouldn't fit this particular question such as what EVEX VEX and SSE are. I will just mark this question resolved and post more questions about those other things. Once again I would like to thank everyone for taking the time to help me understand this complex instruction format! – Dygall 24/8, 2021 at 17:44

@DanielCatalano: wiki.osdev.org/X86-64_Instruction_Encoding covers VEX and EVEX. If you don't know what SSE even is in the first place, see stackoverflow.com/tags/sse/info. (And also the SIMD chapter in Agner Fog's optimizing assembly guide, agner.org/optimize) – Fauve 24/8, 2021 at 17:45

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

AVX / AVX-512: multi-byte prefixes that overlap with opcodes

Related / examples

Machine code tricks: decoding the same byte multiple ways

Recommended topics

Hot tags