Impact of sub-numa-clustering (SNC) on LLC access

I have an Intel Sapphire Rapids CPU with 56 cores. By default, SNC is not enabled. When core 0 accesses a certain memory address A, I think the following will happen:

One of the cache agent is accessed, and it decides the memory is not cached at all.
One of the home agent is accessed to fetch the cacheline from memory
The cacheline is stored on one of the LLC slices. In this particular case, let's assume it's stored on the LLC slice next to core 30.

Let's assume we now enable SNC2. So, core 0 and core 30 are no longer on the same sub-NUMA node. What happens when the same memory address A is accessed from core 0? Will the address-to-LLC-slice mapping change so that A will be mapped to a LLC-slide next to core 0 to 27? Or will it still be stored on core 30 and we are just paying a longer fetch latency?

In other words, does SNC2 effectively partitions the LLC into 2? And if I have a workload that fits in one of the sub NUMA node (but with a very large memory/LLC working set), then SNC2 is not a good choice because of the much smaller (1/2) LLC size?

Thanks!

On Sapphire Rapids (and most intel CPUs), there exist a hashing function that hashes the physical memory address towards a LLC slice.

In UMA mode all memory is hashed towards all LLC slices on the 4 dies.
In SNC mode the memory is mapped towards the LLC slices inside the quadrant or Hemisphere, and each core has a bias to use the LLC locally (if the OS is numa aware) to the quadrant but it cal still access all of the LLC (However the memory and LLC outside of the quadrant has a higher latency).
Inside quadrant or Hemisphere mode, the same LLC mapping is done as SNC, but it is exposed as one numa domain and one physical memory map. This mode does have the requirement that memory is populates equally across all controllers.

In the case of your example, if Hemisphere mode is already active, the behavior will not change, given the mapping of physical addresses towards CHA nodes will stay the same. If UMA mode was activated a different hashing map will be used to make sure that address A will map to CHA inside the same Hemisphere.

However with SNC2 mode each continues memory allocation will only have access to 1/2 of the LLC given there is now a mapping of physical memory to LLC slices. If you would want to use all LLC then you would need a near and a far block of memory allocated one in each numa domain. And note that as stated in this intel documentation remote memory is still distributed across all slices. It is only for the memory local to the numa domain that will only be mapped towards the slices inside that numa domain.

Recommended topics

Hot tags