How does the CPU know how many bytes it should read for the next instruction, considering instructions have different lengths?

Asked 30/5, 2019 at 21:43 Answered 31/5, 2019 at 5:27

Solved assembly x86 cpu-architecture disassembly machine-code

So i was reading a paper, and in it, they said that statically disassembling the code of a binary is undecidable, because a series of bytes could be represented as many possible ways as shown in picture ( its x86 )

so my question is :

how does the CPU execute this then? for example in the picture, when we reach after C3, how does it know how many bytes it should read for the next instruction?
how does the CPU know how much it should increment the PC after executing one instruction? does it somehow store the size of the current instruction and add that when it wants to increment PC?
if the CPU can somehow know how many bytes it should read for the next instruction or basically how to interpret the next instruction, why cant we do it statically?

Enterotomy answered 30/5, 2019 at 21:43 Comment(14)

Many modern processors (especially the x86 family) doesn't use the instruction-set you see, it translates it into an even more basic and simpler instruction set that is then executed. The exact implementation of this translation is to much to go into, but it could use a table and very quick comparisons (perhaps only check one or a few bits). – Chipper 22/8, 2018 at 8:51

why you not ask how cpu know that 55 is push rbp or push ebp (depend from 32 or 64 bit mode) ? cpu must know how decode and interpret instruction bytes. and determinate len of instruction as part of this process – Unmeaning 22/8, 2018 at 8:52

Typically, with 'escape sequences' of bits in the first byte or two. Those billions of transistors in the CPU are there for a reason, as are the complex manner in which they are conected together:) – Shroudlaid 22/8, 2018 at 8:54

Note that this is a hardware question and surely has no connection with C, (removed the C tag). – Shroudlaid 22/8, 2018 at 8:56

Using the same process that a software disassembler does, but in hardware. Given a starting point, (logically) decode one at a time to find the end of that instruction, then decode the next from there. – Illtreat 22/8, 2018 at 10:42

It doesn't know in advance but it will know if has a complete instruction or needs to read more bytes. Yes, you can do that statically too. What you can not do is simply work out target addresses for dynamic jumps and such. In your example you don't know where jmp eax will go. – Hydrokinetic 30/5, 2019 at 21:46

@Hydrokinetic so how can we do it statically, for example how can we know which one of the three possibilities in picture is correct ? and are you saying that the paper is wrong about statically disassembling a binary being undecidable? link to paper : utdallas.edu/~kxh060100/wartell12ccs.pdf . and yes, i know about the "not knowing the destination of jump", but i'm asking about simply disassembling the code. – Enterotomy 30/5, 2019 at 21:48

Since it starts with a jmp eax and you don't know the address for that, disassembling the rest is pointless. Control will not reach those bytes. – Hydrokinetic 30/5, 2019 at 21:58

@Hydrokinetic the main point here is not the code itself, I'm just asking about how the CPU works and how can we resolve this problem when doing it statically. – Enterotomy 30/5, 2019 at 22:2

The cpu works instruction by instruction. First thing it does is jmp eax. If you don't know where that will go, you can't tell what the cpu will do next. If the following bytes are never reached by execution the cpu doesn't care about them. It will never do anything with them, hence they might as well be just random bytes. – Hydrokinetic 30/5, 2019 at 22:3

@Hydrokinetic Okay then just imagine that there is no jump eax and it starts from second instruction, are you saying we can disassemble it then? and there will be only one possible way, and not three? – Enterotomy 30/5, 2019 at 22:6

You can then disassemble until retn which is another dynamic jump you don't know where that will take you. PS: in extreme cases the same bytes may be executed multiple times with a different instruction boundary so the cpu will execute it differently so there is no single correct disassembly. – Hydrokinetic 30/5, 2019 at 22:7

I've answered basically the same question a while ago. – Ky 31/5, 2019 at 13:1

Note that db is not an instruction but rather an assembler directive that says “put this byte here.” The processor always uses the left interpretation when it goes through this code. – Ky 31/5, 2019 at 13:4

The simple way is to just read one byte, decode it and then determine if it's a complete instruction. If not read another byte, decode it if necessary and then determine if a complete instruction has been read. If not continue reading/decoding bytes until the complete instruction is read.

This means that if the instruction pointer is pointing at a given sequence of bytes there's only possible way to decode that first instruction of that sequence of bytes. The ambiguity only comes because the next instruction to be executed might not be located at the bytes immediately that follow the first instruction. That's because the first instruction in the sequence of bytes may change the instruction pointer so some other instruction other than the following one gets executed.

The RET (retn) instruction in your example could be the end of a function. Functions often end with e RET instruction, but not necessarily so. It's possible for a function to have multiple RET instructions, none of which are at the end of the function. Instead the last instruction will be some sort of JMP instruction that jumps back to some location in the function, or to another function entirely.

That means in your example code, without more context, it's impossible to know if any of the bytes following the RET instruction will be ever be executed, and if so, which of the bytes would be the first instruction of the following function. There could be data in between functions or this RET instruction could be the end of the last function in the program.

The x86 instruction set in particular has a pretty complex format of optional prefix bytes, one or more opcode bytes, one or two possible addressing form bytes, and then possible displacement and immediate bytes. The prefix bytes can be prepended to just about any instruction. The opcode bytes determine how many opcode bytes there are and whether the instruction can have a operand bytes and immediate bytes. The opcode may also indicate that there's displacement bytes. The first operand byte determines if there's a second operand byte and if there's displacement bytes.

The Intel 64 and IA-32 Architectures Software Developer's Manual has this figure showing the format of x86 instructions:

Python-like pseudo-code for decoding x86 instructions would look something like this:

# read possible prefixes

prefixes = []
while is_prefix(memory[IP]):
    prefixes.append(memory[IP))
    IP += 1

# read the opcode 

opcode = [memory[IP]]
IP += 1
while not is_opcode_complete(opcode):
    opcode.append(memory[IP])
    IP += 1

# read addressing form bytes, if any

modrm = None
addressing_form = []    
if opcode_has_modrm_byte(opcode):
    modrm = memory[IP]
    IP += 1
    if modrm_has_sib_byte(modrm):
        addressing_form = [modrm, memory[IP]]
        IP += 1
    else:
        addressing_form = [modrm]

# read displacement bytes, if any

displacement = []
if (opcode_has_displacement_bytes(opcode)
    or modrm_has_displacement_bytes(modrm)):
    length = determine_displacement_length(prefixes, opcode, modrm)
    displacement = memory[IP : IP + length]
    IP += length

# read immediate bytes, if any

immediate = []
if opcode_has_immediate_bytes(opcode):
    length = determine_immediate_length(prefixes, opcode)
    immediate = memory[IP : IP + length]
    IP += length

# the full instruction

instruction = prefixes + opcode + addressing_form + displacement + immediate

One important detail left out of the pseudo-code above is that instructions are limited to 15 bytes in length. It's possible to construct otherwise valid x86 instructions that are 16 bytes or longer, but such instruction will generate an undefined opcode CPU exception if executed. (There are other details I've left out, like how part of the opcode can be encoded inside of the Mod R/M byte, but I don't think this affects the length of instructions.)

However x86 CPUs don't actually decode instructions like how I've described above, they only decode instructions as if they read each byte one at a time. Instead modern CPUs will read an entire 15 bytes in to a buffer and then decode bytes in parallel, usually in a single cycle. When it fully decodes the instruction, determining its length, and is ready to read the next instruction it shifts over the remaining bytes in the buffer that weren't part of the instruction. It then reads more bytes to fill the buffer to 15 bytes again and starts decoding the next instruction.

Another thing modern CPUs will do that's isn't implied by what I've written above, is speculatively execute instructions. This means the CPU will decode instructions and tentatively try to execute them even before it has finished executing the previous instructions. This in turn means that the CPU may end up decoding the instructions following the RET instruction, but only if it can't determine where the RET will return to. Because there can be performance penalties from trying to decode and tentatively execute random data that's not intended to be executed, compilers don't normally put data in between functions. Though they may pad this space with NOP instructions that will never be executed in order to align functions for performance reasons.

(They used to put read-only data in between functions long ago, but this was before x86 CPUs that could speculatively execute instructions became common place.)

Pas answered 30/5, 2019 at 23:29 Comment(4)

But based on another question, there shouldnt be any data in code section because it doesnt provide any benefit! link : #55607552 ( they are asking about the same paper ) – Enterotomy 31/5, 2019 at 9:28

Long ago was probably also before split I/D caches, where sparse data in lines holding mostly instructions wastes space in the data cache. (Also TLB capacity, for split iTLB / dTLB). But anyway, @OneAndOnly: that's right, normally decoding compiler output is easy; the end of one instruction identifies the start of the next, even following unconditional jumps. They typically pad with NOPs or int3. You only run into problems with obfuscated binaries. The paper you linked wants to work reliably on any executable, so can't make assumptions. – Illtreat 31/5, 2019 at 9:30

@Enterotomy As explained in my answer data doesn't appear in the code section anymore with modern compilers, but long ago compilers used to put read-only data in code sections because that was the only read-only section programs had. Dedicated read-only data sections were only created when CPUs advanced to a state where putting data in code sections caused performance issues. However old compilers continued to be used long after these CPUs became commonplace (mid to late 90s or so), so you see mixed code and data in a lot x86 programs compiled before 2000, and fair bit after. – Pas 31/5, 2019 at 17:33

@Enterotomy For example, I'm looking right now at dissembled code for game that was compiled in 2007 and it has jump tables for switch statements in between functions. Which is relatively nice. It used to be common to put these jump tables in the middle of functions right after the indirect jump that used them. – Pas 31/5, 2019 at 17:41

On x86 specifically, the instruction encoding is such that from each byte, the decoder can learn how many more bytes follow.

For example, let me show you how the decoder could possibly decode this instruction stream.

the decoder sees 55 and knows that this is push ebp, a single byte instruction. So it decodes push ebp and proceeds to the next instruction.

push ebp
89

the decoder sees 89 which is mov r/m32,r32. This instruction is followed by a modr/m byte specifying the operands.

push ebp
89 e5

the modr/m byte is e5 indicating ebp as the r/m operand and esp as the r operand, so the instruction is mov ebp, esp.

push ebp
mov ebp, esp
8b

this instruction is mov r32,r/m32 which is likewise followed by a modr/m byte.

push ebp
mov ebp, esp
8b 45

this modr/m byte has an r operand of eax and a r/m32 operand of [ebp + disp8] with an 8 bit displacement, which comes with the next byte

push ebp
mov ebp, esp
8b 45 0c

the displacement is 0c so the instruction is mov eax, [ebp + 0xc]

push ebp
mov ebp, esp
mov eax, [ebp + 0xc]
03

this instruction is add r,r/m32 again followed by a modr/m byte.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
03 45

same as before, the r operand is eax while the r/m operand is [ebp + disp8]. The displacement is 08.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
01

this instruction is add r/m32, r followed by a modr/m byte.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
01 05

this modr/m byte indicates an r operand of eax and an r/m operand of [disp32]. The displacement follows in the next four bytes which are 00 00 00 00.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
5d

instruction 5d is pop ebp, a single byte instruction.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
pop ebp
c3

instruction c3 is ret, a single byte instruction. This instruction transfers control to somewhere else, so the decoder stops decoding from here.

push ebp
mov ebp, esp
mov eax, [ebp + 0x0c]
add eax, [ebp + 0x08]
add [0x00000000], eax
pop ebp
ret

In real x86 processors, complicated parallel decoding techniques are employed. This is possible because the processor may cheat and pre-read instruction bytes that may or may not be part of any instruction.

Ky answered 22/8, 2018 at 10:7 Comment(0)

Part of the CPU is an instruction decoder (see e.g. the Wikipedia article on Central Processing Unit). The task of the instruction decoder is to determine how many bytes, starting from the address pointed to by the Instruction Pointer, are part of the current instruction, and decode it into its constituent parts.

There are some architectures (mostly microcontrollers nowadays) where all instructions are the same size. On 64-bit Intel/AMD architecture (x86-64 a.k.a. AMD64), the instruction size varies between 1 and 15 bytes, and the instruction encoding is quite complex.

Tailwind answered 22/8, 2018 at 8:56 Comment(0)

Static disassembly is undecidable because the disassembler cannot discern whether a group of bytes is code or data. The example you provided is a good one: after the RETN instruction, ther might be another subroutine, or it might be some data, then a routine. There is no way to decide which is the correct one until you actually execute the code.

When an opcode is read during the instruction fetch phase, the opcode itself encodes one kind of instruction and the sequencer already knows how much bytes to read from it. There is no ambiguity. In your example, after fetching C3 but before executing it, the CPU will adjust its EIP register (Intruction Pointer) to read what it considers it will be the next instruction (the one beginning with 0F), BUT during the execution of instruction C3 (which it's a RETN instruction), the EIP is changed as RETN is "Return from subroutine) so it won't reach instruction 0F 88 52. That instruction will only be reached if some other part of the code jumps to the location of that instruction. If no code performs such jump, then it will be considered as data, but the problem of determining whether a particular instruction will or will not executed is not a decidable problem.

Some clever disassemblers (I think IDA Pro does that) start from a location known to store code, and assume all following bytes to be instructions as well, UNTIL a jump or ret is found. If a jump is found and the destination of the jump is known by reading the binary code, then the scanning continues there. If the jump is conditional, then the scanning branches into two paths: jump not taken and jump taken.

After all brances have been scanned, everything that is left, is considered data (that means that interrupt handlers, exception handlers, and functions that are called from a function pointer calculated at runtime won't be detected)

Terret answered 30/5, 2019 at 21:58 Comment(6)

so there might be data inside .text section? instead of data sections? any reference for that? because all the binaries that i have worked with had no data inside .text section. and the only reference i found was this paper and couldnt find anything else. do you know any other reference that backs this claim? – Enterotomy 30/5, 2019 at 22:15

Sure, you can place data in .text. Nothing magical about that, except it's probably going to be read only. Also .text is just a name. You can name your sections (at least code and data) whatever you like, the important thing is their attributes. Some sections do have defined names that various tools and loaders depend on. – Hydrokinetic 30/5, 2019 at 22:48

And remember that not all binary code comes from a compiled program originally written in some high level language. You can write assembly code and put data between routines. – Terret 30/5, 2019 at 22:59

@Hydrokinetic but i thought entire section gets the same permissions, for example read only? because when you use readelf on a binary, you can see that for example the .text section is executable, and therefore if we have data inside there, then that would make the data executable as well!! – Enterotomy 30/5, 2019 at 23:15

Yes it does make data placed there executable, but that doesn't mean execution will actually happen. If you don't jump to your data then the cpu doesn't care if it's executable or not. – Hydrokinetic 30/5, 2019 at 23:27

@Enterotomy It's unusual to find data mixed with code in x86. There's no performance benefit (unlike ARM or other ISAs with short-range PC-relative addressing modes). On "normal" compiler output, it does work fine to start at the start of the .text section and assume another instruction starts right after this one, even if it's an unconditional branch. x86 instructions do decode uniquely given a starting point, and compilers never branch into the middle of an instruction that earlier decoded from a different starting point. It's only hard for obfuscated code that tries to break disassembly – Illtreat 31/5, 2019 at 0:10

Your main problem seems to be the following one:

if the CPU can somehow know how many bytes it should read for the next instruction or basically how to interpret the next instruction, why cant we do it statically?

The problem described in the paper is related to "jumping" instructions (which does not only mean jmp, but also int, ret, syscall and similar instructions):

The purpose of such instructions is to continue program execution at a completely different address instead of continuing at the next instruction. (Function calls and while() loops are examples where program execution does not continue at the next instruction.)

Your example begins with the instruction jmp eax which means that the value in the register eax decides which instruction is executed after the jmp eax instruction.

If eax contains the address of the byte 0F, the CPU will execute the jcc instruction (left case in the picture); if it contains the address of 88, it will execute the mov instruction (middle case in the picture); and if it contains the address of 52, it will execute the push instruction (right case in the picture).

Because you do not know which value eax will have when the program is executed, you cannot know which of the three cases will happen.

(I was told that in the 1980s there even were commercial programs where different cases happened during runtime: In your example this would mean that sometimes the jcc instruction and sometimes the mov instruction is executed!)

When we reach after C3, how does it know how many bytes it should read for the next instruction?

How does the CPU know how much it should increment the PC after executing one instruction?

C3 is not a good example because retn is a "jumping" instruction: The "instruction after C3" will never be reached because program execution is continued elsewhere.

However, you might replace C3 by another instruction which is one byte long (such as 52). In this case it is well defined that the next instruction will begin with the byte 0F and not with 88 or 52.

Ranie answered 31/5, 2019 at 5:27 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags