Loop branches typically mispredict only on the last iteration when execution falls through out of the loop instead of jumping to the top. (For fairly obvious reasons: they quickly learn that the branch is always-taken, and predict that way.)
The more iterations your loops run, the more correctly-predicted taken branches you have for the same number of not-taken special cases that mispredict.
Fun fact: on modern Intel CPUs (like Haswell / Skylake), their IT-TAGE branch predictors can "learn" a pattern up to about 22 iterations, correctly predicting the loop exit. With a very long outer loop to give the CPU time to learn the pattern, an inner loop that runs only 22 or fewer iterations tends to predict even the loop-exit branches correctly. So there's a significant dropoff in performance (and instruction throughput) when the inner-loop size grows past that point, if the loop body is pretty simple.
But it probably takes quite a few outer-loop iterations to train the predictors with that much history. I was testing 10 million outer-loop iterations or so, to average out noise and startup overhead for a whole process with perf stat
on real hardware under Linux. So the startup / learning phase was negligible.
With older simpler branch predictors (before TAGE), I think some CPUs did implement loop-pattern prediction with a counter to predict loop exits for inner loops that ran a constant number of iterations every time they were reached. https://danluu.com/branch-prediction/ says the same, that "modern CPUs" "often" have such predictors.