Are there any way to determine or any resource where I can find the branch Target Buffer size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake Intel processors?
Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/
BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf
3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge
BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.
3.8 Branch prediction in Intel Haswell, Broadwell and Skylake
BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.
Intel may describe some data in "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 Branch Prediction Optimization" but still no sizes.
It may looks strange, but there were no information about BTB in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by Gerald J. Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html or in some public materials from Intel workers...
* This table describes the possible cache and TLB configurations * as documented by Intel. For now AMD doesn't use this but gives * exact cache layout data on CPUID 0x8000000x. * * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information * to 80 bytes (of which 16 bytes are used in generic Pentii2). * With 80 possible caches we are on the safe side for one or two years. * * Strange enough no BHT, BTB or return stack data is given this way...
There should be some Performance monitoring unit (PMU) counters for BTB, and there are experiments to get BTB size from running special test programs, check http://xania.org/201602/haswell-and-ivy-btb by Matt Godbolt
Conclusions
From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.
For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.
and more his posts about branch prediction and its events:
- http://xania.org/201602/bpu-part-one Static branch prediction on newer Intel processors
- http://xania.org/201602/bpu-part-two Branch prediction - part two
- http://xania.org/201602/bpu-part-three The BTB in contemporary Intel chips)
- http://xania.org/201602/bpu-part-four Branch Target Buffer, part 2
His code is public, based on Agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py
© 2022 - 2024 — McMap. All rights reserved.