On x86 specifically, the instruction encoding is such that from each byte, the decoder can learn how many more bytes follow.
For example, let me show you how the decoder could possibly decode this instruction stream.
55
the decoder sees 55
and knows that this is push ebp
, a single byte instruction. So it decodes push ebp
and proceeds to the next instruction.
push ebp
89
the decoder sees 89
which is mov r/m32,r32
. This instruction is followed by a modr/m byte specifying the operands.
push ebp
89 e5
the modr/m byte is e5
indicating ebp
as the r/m operand and esp
as the r operand, so the instruction is mov ebp, esp
.
push ebp
mov ebp, esp
8b
this instruction is mov r32,r/m32
which is likewise followed by a modr/m byte.
push ebp
mov ebp, esp
8b 45
this modr/m byte has an r operand of eax
and a r/m32 operand of [ebp + disp8]
with an 8 bit displacement, which comes with the next byte
push ebp
mov ebp, esp
8b 45 0c
the displacement is 0c
so the instruction is mov eax, [ebp + 0xc]
push ebp
mov ebp, esp
mov eax, [ebp + 0xc]
03
this instruction is add r,r/m32
again followed by a modr/m byte.
push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
03 45
same as before, the r operand is eax
while the r/m operand is [ebp + disp8]
. The displacement is 08
.
push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
01
this instruction is add r/m32, r
followed by a modr/m byte.
push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
01 05
this modr/m byte indicates an r operand of eax
and an r/m operand of [disp32]
. The displacement follows in the next four bytes which are 00 00 00 00
.
push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
5d
instruction 5d
is pop ebp
, a single byte instruction.
push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
pop ebp
c3
instruction c3
is ret
, a single byte instruction. This instruction transfers control to somewhere else, so the decoder stops decoding from here.
push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
pop ebp
ret
In real x86 processors, complicated parallel decoding techniques are employed. This is possible because the processor may cheat and pre-read instruction bytes that may or may not be part of any instruction.
55
ispush rbp
orpush ebp
(depend from 32 or 64 bit mode) ? cpu must know how decode and interpret instruction bytes. and determinate len of instruction as part of this process – Unmeaningjmp eax
will go. – Hydrokineticjmp eax
and you don't know the address for that, disassembling the rest is pointless. Control will not reach those bytes. – Hydrokineticjmp eax
. If you don't know where that will go, you can't tell what the cpu will do next. If the following bytes are never reached by execution the cpu doesn't care about them. It will never do anything with them, hence they might as well be just random bytes. – Hydrokineticretn
which is another dynamic jump you don't know where that will take you. PS: in extreme cases the same bytes may be executed multiple times with a different instruction boundary so the cpu will execute it differently so there is no single correct disassembly. – Hydrokineticdb
is not an instruction but rather an assembler directive that says “put this byte here.” The processor always uses the left interpretation when it goes through this code. – Ky