Why are set cache associativity in modern day processors 8-way set associative?

Note that Ice Lake bumps that up to 12-way, 48kiB (adding more ways to each set, same indexing). There's nothing magic about 8-way specifically. Previous AMD designs, like K8 and Bulldozer, experimented with different L1d and L1i geometries, like 64k / 2-way. (Less successfully than Intel's 16k/4-way then 32k/8-way, though.)

For an L1d / L1i cache, 8-way allows a 32k cache to be VIPT without aliasing (see this), given x86's 4k pages. 32kiB is a good power-of-2 "sweet spot" that's small enough to be fast, but large enough and associative enough for good hit rates, and 8-way is the minimum associativity if you want to avoid needing extra tricks to avoid aliasing.

See Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? for more about why we have cache hierarchies (because it's impossible to build a huge cache the size of L2 or L3 with the latency and number of read/write ports we need/want for L1, and trying would be a bad way to spend your power budget). See also Which cache mapping technique is used in intel core i7 processor?

8-way is also associative "enough", e.g. most loops over arrays have fewer than 8 total input and output streams (which would alias each other in L1d if they were coming from the same offsets in page-aligned arrays), and it's a known problem to have more. (And some forms of 4k aliasing of different accesses is also a known problem that software mostly tries to avoid.)

Also note that SKL's 256k L2 cache is only 4-way associative, vs. SKX's 1MiB 16-way L2. (Skylake L2 cache enhanced by reducing associativity?). And L3 caches are typically more than 8-way associative, but I guess you're talking about L1d / L1i caches.

Recommended topics

Hot tags