I have an Intel Sapphire Rapids CPU with 56 cores. By default, SNC is not enabled. When core 0 accesses a certain memory address A, I think the following will happen:
- One of the cache agent is accessed, and it decides the memory is not cached at all.
- One of the home agent is accessed to fetch the cacheline from memory
- The cacheline is stored on one of the LLC slices. In this particular case, let's assume it's stored on the LLC slice next to core 30.
Let's assume we now enable SNC2. So, core 0 and core 30 are no longer on the same sub-NUMA node. What happens when the same memory address A is accessed from core 0? Will the address-to-LLC-slice mapping change so that A will be mapped to a LLC-slide next to core 0 to 27? Or will it still be stored on core 30 and we are just paying a longer fetch latency?
In other words, does SNC2 effectively partitions the LLC into 2? And if I have a workload that fits in one of the sub NUMA node (but with a very large memory/LLC working set), then SNC2 is not a good choice because of the much smaller (1/2) LLC size?
Thanks!