Branch penalty in pipeline results from non-zero distance between ALU and IF.
What does it mean by this statement?
Branch penalty in pipeline results from non-zero distance between ALU and IF.
What does it mean by this statement?
Without (correct) branch prediction, fetch doesn't know what to fetch next until the ALU decides which way a conditional or indirect branch goes. So it stalls until the branch executes in the ALU.
Or with an incorrect prediction, the fetched/decoded instruction from the wrong path are useless, so we call it the branch mispredict penalty; branch prediction hides it in the normal case.
Another term for this is "branch latency" - the number of cycles from fetching a branch instruction until the front-end fetches a useful next instruction.
Note that even unconditional branches have branch latency: the fact that an instruction is a branch at all isn't known until after it's decoded. This is earlier in the pipeline than execution so the possible penalty is smaller than for conditional or indirect branches.
For example, in first-gen MIPS R2000, a classic 5-stage RISC, conditional branches only take half a cycle in the EX stage, and IF doesn't need the address until the 2nd half of a clock cycle, so the total branch latency is kept down to 1 cycle. MIPS hides that latency with a branch-delay slot: the instruction after a branch always executes, whether the branch it taken or not. (Including unconditional direct branches; the ID stage can produce the target address on its own.) Later more deeply pipelined MIPS CPUs (especially superscalar and/or out-of-order) did need branch prediction, with the delay slot not able to fully hide branch latency.
It means, you had penalty between the cycles of the processor. Every processor has cycles of operation, each delay in the cycle will result in a penalty, as it waits until the branch executes in the ALU or:
Branch penalty in pipeline results from non-zero distance between ALU and IF.
There is a wonderful, but long book called Computer Architecture Piplined And Parallel Processor Design .
It explains in detail regarding the issue.
Short Answer:
The penalty for mis-predicting the next possible branch would result in time wastage (CPU clock cycles) as
Long Answer: Look up: "Instruction pipelining", "Branch prediction" , "Loop unrolling", ...
int_misc.clear_resteer_cycles
on Skylake is "[Cycles the issue-stage is waiting for front-end to fetch from resteered path following branch misprediction or machine clear events]". (Resteer = point the fetch stage at the correct path.) Run perf list
on a Linux machine with a similar CPU to see that. –
Anaximenes © 2022 - 2024 — McMap. All rights reserved.