How prevalent is branch prediction on current CPUs?
Asked Answered
R

5

28

Due to the huge impact on performance, I never wonder if my current day desktop CPU has branch prediction. Of course it does. But how about the various ARM offerings? Does iPhone or android phones have branch prediction? The older Nintendo DS? How about PowerPC based Wii? PS 3?

Whether they have a complex prediction unit is not so important, but if they have at least some dynamic prediction, and whether they do some execution of instructions following an expected branch.

What is the cutoff for CPUs with branch prediction? A hand held calculator from decades ago obviously doesn't have one, while my desktop does. But can anyone more clearly outline where one can expect dynamic branch prediction?

If it is unclear, I am talking about the kind of prediction where the condition is changing, varying the expected path during runtime.

Rosabelle answered 23/11, 2011 at 11:31 Comment(1)
This is a really interesting question! I'd like to know about the most popular embedded processors too.Luxate
U
11

Any CPU with a pipeline beyond a few stages requires at least some primitive branch prediction, otherwise it can stall waiting on computation results in order to decide which way to go. The Intel Atom is an in-order core, but with a fairly deep pipeline, and it therefore requires a pretty decent branch predictor.

Old ARM 7 designs were only three stages. Combine that with things like branch delay slots (required on MIPS, optional on SPARC), and branch prediction isn't so useful.

Incidentally, when MIPS decided to get more performance by going beyond 4 pipeline stages, the branch delay slot became an annoyance. In the original design, it was necessary, because there was no branch predictor. Therefore, you had to sequence your branch instruction prior to the last instruction to be executed before the branch. With the longer pipeline, they needed a branch predictor, obviating the need for a branch delay slot, but they had to emulate it anyway in order to run older code.

The problem with a branch delay slot is that it can only be filled with a useful instruction about 50% of the time. The rest of the time, you either fill it with an instruction whose result is likely to be thrown away, or you use a NO-OP.

Ursola answered 19/2, 2012 at 3:35 Comment(1)
Informative. Gets me a little bit closer to getting a feel for where the approximate "cutoff" might be.Rosabelle
T
10

Modern high end superscalar CPUs with long pipelines (which means almost all CPUs commonly found in desktops and servers) have quite sophisticated branch prediction these days.

Most ARM CPUs do not have branch prediction, which saves silicon and power consumption, but ARM CPUs generally have relatively short pipelines. Also the support for conditional execution of most instructions in the ARM ISA helps to reduce the number of branches required (and hence mitigates the cost of branch misprediction stalls).

Tint answered 23/11, 2011 at 11:35 Comment(2)
Because the NEON pipeline is behind the main ARM pipeline, there is a significant branch miss penalty if you are doing NEON computation.Defrayal
@Anthony Blake: good point - so using conditional execution rather than branches is probably a good idea when you have NEON instructions in the mix.Tint
T
4

Branch prediction is getting more important and emphasized while ARM is getting more complicated.

For example new 64-bit ARM architecture called ARMv8 drops most use of conditional execution (mainly due to instruction encoding space restrictions with increased number of registers) and relies on branch prediction to keep performance at acceptable levels.

Even for newer ARMv7-a devices you can check terrible cases like unsorted data question on SO, which branch prediction improvement is around 3x.

Toombs answered 16/3, 2014 at 7:26 Comment(0)
D
0

Not so much for the ARM Cortex-A8 (though it does have some branch prediction), but I believe the Cortex-A9 is out-of-order super-scalar, with complex branch prediction.

Defrayal answered 23/11, 2011 at 11:34 Comment(2)
Thanks, but I am looking for a more general answer. Saying that Cortex-A8 has "not so much, but some" isn't helping either.Rosabelle
Additionally, I just found ARM Cortex-A8 has a 13 cycle penalty for missed branches, so I think it's safe to say it takes branch prediction seriously.Rosabelle
P
0

You can expect Dynamic Branch predictor in any out of order processor, those processors not only rely on pipelining but also fetch multiple instructions at the time, and they have multiple execution units(Floating point units, ALU), more registers; to increase the instruction execution, you have multiple instructions on the fly on any given moment, of course branches are a problem if you want to keep all that machinery utilization high so this kind of processors, rely on dynamic branch prediction in order to keep throughput and utilization very high.

You can expect any server to have dynamic branch prediction, also desktops, in the past embedded systems like the ARM chips in current smartphones did not have branch predictions since they had smaller pipelines, and they did not have out of order execution, but as Moore's law give us more transistor per area, you will start seeing more and more processors increasing their architecture. So to answer your question, besides the obvious looking for the CPU specs, you can expect to have branch prediction on chips of 32 Bits, bigger pipelines, out of order exection. The most recent chips from ARM are moving in some level to this directions.

Perlman answered 19/5, 2014 at 10:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.