BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?
Asked Answered
L

1

8

Are there any way to determine or any resource where I can find the branch Target Buffer size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake Intel processors?

Lentiginous answered 21/7, 2016 at 19:33 Comment(0)
H
11

Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/

BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf

3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge

BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.

3.8 Branch prediction in Intel Haswell, Broadwell and Skylake

BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.

Intel may describe some data in "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 Branch Prediction Optimization" but still no sizes.

It may looks strange, but there were no information about BTB in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by Gerald J. Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html or in some public materials from Intel workers...

 * This table describes the possible cache and TLB configurations
 * as documented by Intel. For now AMD doesn't use this but gives
 * exact cache layout data on CPUID 0x8000000x.
 *
 * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information
 * to 80 bytes (of which 16 bytes are used in generic Pentii2).
 * With 80 possible caches we are on the safe side for one or two years.
 *
 * Strange enough no BHT, BTB or return stack data is given this way...

There should be some Performance monitoring unit (PMU) counters for BTB, and there are experiments to get BTB size from running special test programs, check http://xania.org/201602/haswell-and-ivy-btb by Matt Godbolt

Conclusions

From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.

For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.

and more his posts about branch prediction and its events:

His code is public, based on Agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py

Haphtarah answered 21/7, 2016 at 19:47 Comment(2)
Branch prediction seems to be part of the "secret sauce" that CPU companies don't publish details about. Probably for fear of helping out their competitors. Most of what we know seems to be based on experimental tests based on theories about how things work. Definitely interesting how much it's possible to figure out.Forras
..and how to implement good branch prediction for open-source OOO cpu cores like github.com/ucb-bar/riscv-boom/blob/master/src/main/scala/… riscv.org/wp-content/uploads/2016/01/… ccelio.github.io/riscv-boom-doc Chapter 3Haphtarah

© 2022 - 2024 — McMap. All rights reserved.