TAGE prediction accuracy improves with loop over larger array?
Asked Answered
H

1

0

The code snippet iterates through a 1D matrix. (N is the size of the matrix).

for (i=0; i< N; i++) // outer loop for Rows

When I run this piece of code on a processor simulator to measure TAGE accuracy, I realize that as the size of the array (N) increases, the TAGE accuracy increases.

What is the reason for this?

Halverson answered 21/1, 2021 at 4:35 Comment(1)
If you're going to edit your question, don't revert to wrong tags and a more generic title. The fact that it's about prediction accuracy is already covered by the [branch-prediction] tag. Your "TAGE Accuracy - Loop Accuracy" title does mention loops, but is less specific than the title from my last edit that you rolled back. I edited again to fix it, so I think it's fine now. But next time please be more careful with your edits, especially tags.Tang
T
3

Loop branches typically mispredict only on the last iteration when execution falls through out of the loop instead of jumping to the top. (For fairly obvious reasons: they quickly learn that the branch is always-taken, and predict that way.)

The more iterations your loops run, the more correctly-predicted taken branches you have for the same number of not-taken special cases that mispredict.


Fun fact: on modern Intel CPUs (like Haswell / Skylake), their IT-TAGE branch predictors can "learn" a pattern up to about 22 iterations, correctly predicting the loop exit. With a very long outer loop to give the CPU time to learn the pattern, an inner loop that runs only 22 or fewer iterations tends to predict even the loop-exit branches correctly. So there's a significant dropoff in performance (and instruction throughput) when the inner-loop size grows past that point, if the loop body is pretty simple.

But it probably takes quite a few outer-loop iterations to train the predictors with that much history. I was testing 10 million outer-loop iterations or so, to average out noise and startup overhead for a whole process with perf stat on real hardware under Linux. So the startup / learning phase was negligible.

With older simpler branch predictors (before TAGE), I think some CPUs did implement loop-pattern prediction with a counter to predict loop exits for inner loops that ran a constant number of iterations every time they were reached. https://danluu.com/branch-prediction/ says the same, that "modern CPUs" "often" have such predictors.

Tang answered 21/1, 2021 at 5:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.