Does Program Counter hold current address or the address of the next instruction?

Asked 25/8, 2018 at 16:59 Answered 12/3, 2021 at 19:33

assembly cpu-architecture program-counter

Being a beginner and self-learner, I am learning assembly and currently reading the chapter 3 of the book, The C Companion by Allen Hollub. I can't understand the description of Program Counter or PC he describes in an imaginary demo machine with two byte word. Here is the description of PC in page 57.

"The PC always holds the address of the instruction currently being executed. It is automatically updated as each instruction executed to hold the address of the next instruction to be executed. ... ... The important concept here is that the PC holds the address of the next instruction, not the instruction itself. "

I fail to understand the difference between holding the current address and the address of the next instruction. Does PC hold the two addresses in two consecutive bytes at the same time?

Carbonization answered 25/8, 2018 at 16:59 Comment(5)

It depends a lot on the implementation of the CPU, some increment the internal register that corresponds to the program counter at the start of execution of an instruction, some at the end. However with most modern CPUs neither is true. They don't have a single internal register you can point to and say its the program counter, instead its just a conceptual part of the architecture state. See this answer: #51943023 – Married 25/8, 2018 at 17:26

Before an instruction can be executed it has to be first read from memory. Reading it will increment the instruction counter. This in general only matters for calculating the offset of a call or jump location, the assembler takes care of that detail. – Perkins 25/8, 2018 at 17:39

@HansPassant: This question is not about x86. In x86, IP / EIP / RIP logically holds the address of the next instruction while the current one is being executed. But that's not how the author of the book describes their paper architecture. Having a PC that holds the address of the current instruction is a valid design. For an OoO / pipelined design, it makes no real difference. For a simple in-order with a single physical PC register, it would mean the instruction-fetch logic needs to calculate a next-PC, or else the next instruction can't even be fetched while executing the current. – Laconia 25/8, 2018 at 17:44

@Peter Cordes, This demo machine was loosely based 68000 and PDP11. Thank you. – Carbonization 25/8, 2018 at 20:42

I had an answer half-written when I posted that comment, and Martin posted his answer. I finally got around to finishing my answer, including a section that expands on that comment. – Laconia 26/8, 2018 at 4:18

I can't understand the description of Program Counter or PC he describes in an imaginary demo machine with two byte word.

He is describing a simple CPU which explains how CPUs work in general.

Real CPUs are much more complex:

In many manuals (for any kind of CPU) you'll find sentences like: "The PC register is pushed on the stack."

This typically means that the address of the instruction that is executed after returning from a call instruction is pushed on the stack.

However such sentences are not 100% correct: In the case of a 68k CPU (see below) the address of the next instruction is written, not the address of the current instruction plus 2!

For most CPUs PC-relative jump instructions are relative to the address of the next instruction; however there are counter-examples (such as PowerPC VLE).

32-bit x86 CPUs (as used in most desktop / laptop computers)

On such CPUs, only call directly reads the EIP register, and only jump instructions write EIP. This is enough "insulation" that this register is some internal circuit in the CPU, if there is a physical EIP register at all, and you don't necessarily know its content.

(You could count int instructions like int3 or int 0x80 as reading CS:EIP as well, because they have to push an exception frame. But it makes more sense to think of them as triggering the exception-handling machinery.

It is highly probable that different x86 CPUs work differently internally so the actual content of the EIP "register" is different in different CPUs. (And modern high-performance implementation won't have just one EIP register, but they do whatever is necessary to preserve the illusion and push the right return address when needed.)

(PC-relative jumps are relative to the address of the next instruction.)

64-bit x86 CPUs

These CPUs have instructions that directly use the RIP register, like mov eax,[rip+symbol_offset] to do a PC-relative load of static data; makes position-independent code for shared libraries and ASLR significantly more efficient than 32-bit x86. In this case "RIP" is the address of the next instruction.

68k

These CPUs also have a possibility to directly use the content of the PC register. In this case the PC reflects the address of the current instruction plus 2 (I'm not absolutely sure here).

Because such instructions are at least 4 bytes long the value of the PC register will reflect the address of a byte "in the middle" of an instruction.

ARM

When reading the PC on ARM CPUs (it can be read directly!) the value typically reflects the address of the current instruction plus 8, in some situations even plus 12!

(Instructions are 4 bytes long so "current instruction plus 8" means: The address of two instructions ahead!)

Eleneeleni answered 25/8, 2018 at 18:48 Comment(2)

mov eax,[rip] loads 4 bytes of the next instruction. I think you mean lea rax, [rip] which simply reads RIP instead of dereferencing it. 32-bit x86 has call, which pushes the current IP/EIP/RIP as the return address, and is documented that way. So x86 does have PC=next insn. Reading program counter directly. See also Why can't you set the instruction pointer directly? for more about how 32-bit ARM exposes PC as one of the 16 general-purpose registers. – Laconia 25/8, 2018 at 19:30

@PeterCordes Thanks for the comment. I wanted to write "that uses the RIP register". I corrected that. – Eleneeleni 26/8, 2018 at 4:13

Those claims could be talking about two different points in time, during vs. after the execution of an instruction.

What was in those [...] that you omitted? Did it talk about finishing execution of one instruction and starting to fetch the next instruction, after incrementing PC by 2 bytes / 1 instruction-word?

Otherwise it's an error in the book, because those two claims (that PC points to the current instruction vs. the next instruction during execution of the current instruction) are incompatible.

I fail to understand the difference between holding the current address and the address of the next instruction

Consider these (x86) instructions in memory, using 2-byte instructions to match the ISA from your book (x86 instruction are variable length from 1 to 15 bytes, including optional / mandatory prefix bytes):

 a:  0x66 0x90     nop
 c:  0x66 0x90     nop

Each instruction has its own address. I've indicated their starting addresses with hex digits (which could also be symbolic labels in assembler syntax, but this is intended to be a mockup of disassembler output, like objdump -d). The "address of an instruction" is the address of its first byte in memory, regardless of what the architectural PC would hold before/during/after executing it.

While the first nop is executing, the address of the next instruction is c. The "current instruction" is the first nop, regardless of what value PC (logically) has while it executes.

Most instructions don't actually read PC as a data input. Only relative jumps and PC-relative loads/stores need it. (And thus the compiler/assembler needs to know the rule for calculating relative displacements.)

MIPS and RISC-V also/instead have aupc instructions that add a register or immediate to the program counter, and put the result in another register. So instead of a PC-relative addressing mode, they have a PC-relative add, to produce a pointer you can use as an addressing mode. But same difference, really.

As long as there's a consistent rule for the logical value of PC during the execution of an instruction, it doesn't really matter what the exact rule is.

PC = start of current instruction (e.g. MIPS logically works this way, regardless of what internal implementations actually do).

MIPS relative branches are relative to PC + 4 (i.e. relative to the next instruction so for this purpose it's just a quirk of how it's documented), but MIPS jumps replace the low 28 bits of PC, not of PC+4 (which potentially differs in its high bits). See also http://www.cim.mcgill.ca/~langer/273/13-datapath1.pdf which goes over the logical operation of instruction fetch / execute on MIPS.)
PC = start of next instruction (common, e.g. x86)
PC = start of 2 instructions later. (e.g. ARM)

Why does the ARM PC register point to the instruction after the next one to be executed? TL:DR: an artifact of a 3-stage fetch-decode-execute pipeline front-end in early ARM designs. (32-bit ARM exposes the program counter as r15, one of the 16 "general purpose" registers, so you can actually jump with or pc, r0, #4 or something, as well as reading it in any instruction for PC-relative addressing).

As @Ross says, only a simple non-pipelined CPU will have a single physical program-counter register. (How does branch prediction interact with the instruction pointer).

But if any instruction raises an exception (faults), it usually needs to store either the address of the faulting instruction, or the address of the next instruction, somewhere. That depends on what kind of exception it is. A debug / single-step exception would store the address of the next instruction, so returning from the exception handler would step. A page-fault would store the address of the faulting instruction so the default action is to retry it.

The exception-handling rules are going to be separate from the normal PC-during-execution rules, so the hardware has to remember instruction-lengths, or instruction-start address to be able to handle exceptions. It doesn't have to be efficient, because interrupts/exceptions are rare; it's ok for the CPU to take multiple cycles before it even jumps to the interrupt-handler. (The normal-operation case of PC-relative addressing modes, and call instructions, does have to be efficient.)

Implications of a simple physical implementation with PC=current instruction

Having a PC that holds the address of the current instruction is a valid design.

For a superscalar pipelined design, especially with Out-of-Order execution, it makes no real difference. The pipeline needs to track the address (and length if variable) of each instruction as it goes through the pipeline, because it can fetch/decode/execute more than 1 per cycle. It fetches in large blocks, and decodes up to n instructions from that block. Some implementations might require fetch-blocks to be 16-byte aligned, for example. (See https://agner.org/optimize/ for details on how various x86 microarchitectures do it, and how to optimize for the front-end fetch/decode patterns in Pentium, Pentium Pro, Nehalem, etc. Fortunately modern x86 CPUs have decoded-uop caches and are much less sensitive to fetch/decode issues in loops.)

(Semi-related: x86 registers: MBR/MDR and instruction registers modern)

For a simple in-order non-pipelined CPU with a single physical PC register, it would mean the instruction-fetch logic needs to calculate a next-PC, or else the next instruction can't even be fetched while executing the current.

In x86, IP / EIP / RIP logically holds the address of the next instruction while the current one is being executed. This makes sense given its origins in 8086, which only had ~29k transistors. It prefetched from the instruction stream while the current insn was being executed (into a small 6-byte buffer, which isn't even long enough to hold a whole instruction if extra prefixes are used, but which holds 6 single-byte instructions). But it didn't even start decoding the next until the current one was finished. (i.e not pipelined at all, or arguably 2-stage if you count prefetch which is very easy to decouple. This remained the case until 486, I think.)

With a variable-length ISA, instruction-length isn't discovered until decode. Having PC = end of current instruction maybe matters more, because you can't just calculate PC+4 the way MIPS can, or PC+2 with your toy ISA. But you also can't go backwards unless you know the instruction length, so to properly handle exceptions 8086 must have tracked the instruction-start as well, or remembered the instruction-length.

Laconia answered 26/8, 2018 at 3:41 Comment(4)

The 8086 and the 80186 were both 2-stage pipelined: a fetch stage and an execute stage, both can function in the same cycle. The architectural registers are the same as physical registers, including IP. I've read multiple books that claim that the first pipelined Intel processor is the 80486. I don't know where they got that from. There are plenty of resources that discuss the two-stage 8086 pipeline and later designs. I'm not aware of any microprocessor that is not pipelined. – Upthrust 26/8, 2018 at 4:30

@HadiBrais: I think people see prefetch as just so obvious and easy that it doesn't even count as pipelining, even though it really is. My understanding is that the real fundamental difference is when you pipeline even decode, because then a microcoded internal implementation can't work the same way, where the decode and operand-fetch process might use some of the same adders as the execute process. – Laconia 26/8, 2018 at 4:32

Yeah I think so. Although it is technically a pipe stage. Even now in every textbook and paper they talk about the code fetch stage. IIRC, the 80286 had 4 pipe stages, which makes sense because it introduced protected mode stuff and had higher frequency. – Upthrust 26/8, 2018 at 4:35

@PeterCordes Here is the missing line I didn't mention.

"... to be executed. In our demo machine, all instructions occupy exactly 2 bytes, so the PC is normally incremented by two with every instruction. The important concept here... ."

Here is the last line I also didn't write. "

When an instruction is executed, the machine first fetches the actual instruction from the indicated address, and then execute it

". You guessed right and, as you said the two things were incompatible, it helped me a lot because I'm, at least thinking toward right direction. – Carbonization 26/8, 2018 at 7:6

This is a real instruction set but it doesn't matter, I am not interested in how this real instruction works - it will serve to demonstrate the issue.

2000: 0b 12        push r11     
2002: 3b 40 21 00  mov #33, r11
2006: 3b 41        pop r11      
2008: 30 41        ret

As it has been mentioned there is a notion of time when talking about the program counter.

A super simple processor, old 8 bit, and others can be thought of like this, newer ones are different.

When we enter this code, however we get here, doesn't matter. The program counter is 0x2000. This tells us where to fetch the instruction we have to fetch it, decode it, and then execute it, repeat.

These are 16 bit instructions, two bytes, the processor starts to fetch with the pc pointing at the instruction so the address of the instruction. The processor reads two bytes one at address 0x2000 (0x0b), the processor increments the program counter to 0x2001 and uses that to fetch the second half of the instruction at address 0x2001 (0x12) and increments the program counter to 0x2002. So for each fetch in this made-up processor I am describing for each fetch you fetch using the program counter as the address then increment the program counter.

before data after
0x2000 0x0b 0x2001
0x2001 0x12 0x2002

So now we decode the instruction, the program counter is currently showing 0x2002, we see that this is a push r11, so we move on to execute.

During execution of this instruction the program counter remains 0x2002. The register r11's value is pushed onto the stack.

Now we begin to fetch the next instruction.

before data after
0x2002 0x3b 0x2003
0x2003 0x40 0x2004

As we decode this instruction (pc == 0x2004) mov #immediate,r11 the processor realizes that there is an immediate required for this instruction so it needs to fetch two more bytes

before data after
0x2004 0x21 0x2005
0x2005 0x00 0x2006

It has determined that it can now execute the instruction (little endian 0x0021 = 33decimal) by writing the value 0x0021 into register r11. During execution the program counter is 0x2006 for this instruction.

before data after
0x2006 0x3b 0x2007
0x2007 0x41 0x2008

decode and execute a pop r11

So you can start to see that the program counter does actually contain at least two values. At the start of the instruction before fetching it contains the address of the instruction, after fetching and decode just before we start to execute it contains the address of the byte after this instruction which if this is not a jump is another instruction. If this is an unconditional jump that byte could be an instruction or some data, or unused memory. But we say that it "points to the next instruction" meaning in this case before execution the address after this instruction which often has another instruction. But as we will see next the pc can be modified by the instruction. But always at the END of execution it points (for this simple made up processor which is similar to a number of simple 8 bit processors) to the next instruction to be executed.

Lastly

before data after
0x2008 0x30 0x2009
0x2009 0x41 0x200A

decodes a ret, now this one is special as far as the question goes because a ret is going to modify the program counter during execution per the rules of this processor. If the instruction that called address 0x2000 was say 0x1000 and it was a two byte instruction then after fetching and during decoding the program counter would be at address 0x1002, during execution the address 0x1002 would be stored somewhere per the rules of this instruction set and the program counter would take on the value 0x2000 to call this subroutine. When we get to the ret instruction and begin to execute it then we start execution of the ret with the program counter containing 0x200A but the ret puts the address of the instruction after the call, the value stored during the execution of the call, so at the end of this instruction the program counter would contain the value 0x1002 and the next fetch would be from that address.

So in this last instruction just before execution the pc points to what would normally be the next instruction for instructions that don't branch or jump or call. 0x200A. But during execution the program counter has been changed so that the "next" instruction is the one after the call that got us here.

Some more

c064:   0a 24           jz  $+22        ;abs 0xc07a
c066:   4e 5e           rla.b   r14

before fetching the pc is 0xC064. after fetch and decode the pc is 0xC066. The instruction says jump if zerp to 0xc07a. So if the zero flag is not set then the pc stays at 0xC066 and that is where it starts the next instruction, but if z is set then the pc is modified to 0xc07a and that is where the next instruction to execute will be. So before 0xc064 after 0xc066 or 0xc07a depending.

The after of one instruction is the before of the next.

unconditional jump

c074:   c2 4d 21 00     mov.b   r13,    &0x0021 
c078:   ee 3f           jmp $-34        ;abs 0xc056

before fetching 0xc07a, before execution 0xc07A after execution 0xc056

For that one instruction the pc held at least three values (if fetching a byte at a time then it held 0xc078, 0xc079, 0xc07a and ended with 0xc056) during one instruction.

Yes it can and does hold more than one value, but not at the same time, one value at a time during the phases of the instruction.

Bothwell answered 9/11, 2018 at 3:34 Comment(0)

Initially, PC(register) holds the current value but as the clock signal changes it changes to PC(Previous address + value) and it will contain the same value till the next clock cycle and after the addition of value it will store the address in the register.

Sheepish answered 12/3, 2021 at 19:33 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Implications of a simple physical implementation with PC=current instruction

Recommended topics

Hot tags