What does it mean by a branch penalty?

Asked 2/6, 2019 at 6:12 Answered 8/10, 2019 at 12:5

Solved assembly cpu-architecture branch-prediction

Branch penalty in pipeline results from non-zero distance between ALU and IF.

What does it mean by this statement?

Nazler answered 2/6, 2019 at 6:12 Comment(1)

Surprisingly, Branch predictor - Wikipedia does a very good job explaining branch prediction (the associated models) and the misprediction penalty in great detail. Always a top contender for the first stop. – Lymphocytosis 2/6, 2019 at 6:31

Without (correct) branch prediction, fetch doesn't know what to fetch next until the ALU decides which way a conditional or indirect branch goes. So it stalls until the branch executes in the ALU.

Or with an incorrect prediction, the fetched/decoded instruction from the wrong path are useless, so we call it the branch mispredict penalty; branch prediction hides it in the normal case.

Another term for this is "branch latency" - the number of cycles from fetching a branch instruction until the front-end fetches a useful next instruction.

Note that even unconditional branches have branch latency: the fact that an instruction is a branch at all isn't known until after it's decoded. This is earlier in the pipeline than execution so the possible penalty is smaller than for conditional or indirect branches.

For example, in first-gen MIPS R2000, a classic 5-stage RISC, conditional branches only take half a cycle in the EX stage, and IF doesn't need the address until the 2nd half of a clock cycle, so the total branch latency is kept down to 1 cycle. MIPS hides that latency with a branch-delay slot: the instruction after a branch always executes, whether the branch it taken or not. (Including unconditional direct branches; the ID stage can produce the target address on its own.) Later more deeply pipelined MIPS CPUs (especially superscalar and/or out-of-order) did need branch prediction, with the delay slot not able to fully hide branch latency.

Anaximenes answered 2/6, 2019 at 6:25 Comment(0)

It means, you had penalty between the cycles of the processor. Every processor has cycles of operation, each delay in the cycle will result in a penalty, as it waits until the branch executes in the ALU or:

Branch penalty in pipeline results from non-zero distance between ALU and IF.

There is a wonderful, but long book called Computer Architecture Piplined And Parallel Processor Design .

It explains in detail regarding the issue.

Zygodactyl answered 2/6, 2019 at 6:30 Comment(1)

You mean stages, not cycles. A "cycle" is the unit of time, a clock cycle. During which instructions move to the next pipeline stage (if they're not stalled) – Anaximenes 2/6, 2019 at 6:38

Short Answer:

The penalty for mis-predicting the next possible branch would result in time wastage (CPU clock cycles) as

the wrongly predicted branch which was fetched and executed speculatively would then need to be discarded and
the actual next branch would need to fetched and executed reactively anyway.

Long Answer: Look up: "Instruction pipelining", "Branch prediction" , "Loop unrolling", ...

Transparency answered 8/10, 2019 at 12:5 Comment(1)

You're using the word "branch" to describe non-branch instructions on a (possible) path of execution. This is confusing. Instead say "the wrongly predicted path was fetched and needs to be discarded". For example, Intel's description of the perf-counter event int_misc.clear_resteer_cycles on Skylake is "[Cycles the issue-stage is waiting for front-end to fetch from resteered path following branch misprediction or machine clear events]". (Resteer = point the fetch stage at the correct path.) Run perf list on a Linux machine with a similar CPU to see that. – Anaximenes 8/10, 2019 at 19:9

Recommended topics

Hot tags