The exact workings of branch predictors will vary between processors. But nearly all non-trivial branch predictors need a history of the branches in the program to function.
This history is recorded in the branch history buffer.
These come in multiple flavors. The two most commonly studied are:
- Local History - which tracks the history of each individual branch.
- Global History - which tracks the combined history of all the branches.
Modern processors will have multiple buffers for different purposes. In all cases, the buffers have a limited size. So when they run out of room, something will need to be evicted.
Neither Intel nor AMD gives details about their branch predictors. But it is believed that current processors from both companies can track thousands of branches along with their histories.
Getting back to the point, the data that is used by the branch predictors will "stick" for as long as it stays in the history buffers. So the performance of the predictors is best if the code is small and well-behaved enough to not overrun the buffers.
- If most of the computation is spent in a small amount of code, the local history buffers will be able to track all the branches that are commonly hit.
- If the computation is all over the place, there may be too many branches for the branch predictor to track and thus its performance will degrade.
Note that the instruction and uop caches, while independent of the branch predictor, will exhibit the same effects. So it may be difficult to single out the branch predictor when attempting to construct test cases and benchmarks to study its behavior.
So this is yet another case in performance where having locality has advantages.