About Adaptive Mode for L1 Cache in Hyper-threading

Asked 3/9, 2012 at 14:36 Answered 1/12, 2020 at 13:51

performance intel cpu-architecture cpu-cache hyperthreading

I'm a student doing some research on Hyper-threading recently. I'm a little confused about the feature - L1 Data Cache Context Mode.

In the architecture optimization manual, it was described that L1 cache can operate in two modes:

The first level cache can operate in two modes depending on a context-ID bit:
1. Shared mode: The L1 data cache is fully shared by two logical processors.
2. Adaptive mode: In adaptive mode, memory accesses using the page directory is mapped identically across logical processors sharing the L1 data cache.

However, I am curious about how cache get partitioned in the adaptive mode according to the description.

Earsplitting answered 3/9, 2012 at 14:36 Comment(0)

On Intel arch, a value of 1 of L1 Context ID indicates the L1 data cache mode can be set to either adaptive mode or shared mode, while a value of 0 indicates this feature is not supported. Check the definition of IA32_MISC_ENABLE MSR Bit 24 (L1 Data Cache Context Mode) for details.

According to Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3A (Chapter 11/Cache Control), which I quote below:

Share mode

In shared mode, the L1 data cache is competitively shared between logical processors. This is true even if the logical processors use identical CR3 registers and paging modes. In shared mode, linear addresses in the L1 data cache can be aliased, meaning that one linear address in the cache can point to different physical locations. The mechanism for resolving aliasing can lead to thrashing. For this reason, IA32_MISC_ENABLE[bit 24] = 0 is the preferred configuration for processors based on the Intel NetBurst microarchitecture that support Intel Hyper-Threading Technology.

Adaptive Mode

Adaptive mode facilitates L1 data cache sharing between logical processors. When running in adaptive mode, the L1 data cache is shared across logical processors in the same core if:

• CR3 control registers for logical processors sharing the cache are identical.

• The same paging mode is used by logical processors sharing the cache.

In this situation, the entire L1 data cache is available to each logical processor (instead of being competitively shared). If CR3 values are different for the logical processors sharing an L1 data cache or the logical processors use different paging modes, processors compete for cache resources. This reduces the effective size of the cache for each logical processor. Aliasing of the cache is not allowed (which prevents data thrashing).

I just guess there is no definite approach for partitioning the L1 data cache.

Howarth answered 4/9, 2012 at 6:1 Comment(1)

I have seen these definitions. What I curious about is how to partition the L1 data cache and what is used to identify different processes in L1 cache? – Earsplitting 4/9, 2012 at 6:44

The document just states that if you use the adaptive mode and if CR3 or the paging mode differ between cores, the cache is not shared and the cores "compete" for the cache. It doesn't tell how the partitioning works.

The most straightforward manner to implement this would be to statically reserve half of the ways of the data cache to each of the processors. This would essentially assign half the data cache statically to each processor.

Alternatively they could add an additional bit to the virtual tag of each cache line to distinguish which processor the line belongs to. This would allow a dynamic partition of the cache. This fits the description of "competing" for the cache better than a static partition.

If you really need to know, you could design some micro-benchmarks to verify that one these schemes is actually used.

Amphigory answered 5/9, 2012 at 15:27 Comment(0)

The L1 data cache is not partitioned in either mode and is always competitively shared.

Note that there is an obvious error in the manual, the mode isn't determined by the context-ID bit, but by IA32_MISC_ENABLE[24]. This enhancement is supported on later steppings of Northwood with HT and all Prescott processors with HT. The default value is zero, which represents adaptive mode. However, in certain processors, an updated BIOS may switch to shared mode by setting IA32_MISC_ENABLE[24] due to a bug in these processors that occurs only in adaptive mode.

In earlier steppings of Northwood with HT, only shared mode is supported. In shared mode, when a load request is issued to the L1 data cache, the request is first processed on the "fast path," which involves making a way prediction based on bits 11-15 of the linear address and producing a speculative hit/miss signal as a result. In processors with HT, the logical core ID is also compared. Both the partial tag and logical core ID have to match in order to get a speculative hit. In general, this helps improving the correct speculative hit rate.

If the two sibling logical cores operate in the same paging mode and have identical CR3 values, which indicate that accesses from both cores use the same page tables (if paging is enabled), it would be better to produce a speculative hit even if the logical core ID doesn't match on the fast path of the cache.

In adaptive mode, a context ID value is calculated whenever the paging mode or the CR3 register of one of the cores is changed. If the paging modes and the CR3 values match, the context ID bit is set to one of the two possible values. Otherwise, it's set to the other value. When a load request is issued to the cache, the context ID is checked. If indicates that the cores have the same address translation structures, the logical core ID comparison result is ignored and a speculative hit is produced if the partial virtual tag matched. Otherwise, the logical core ID comparison takes effect as in shared mode.

Meade answered 1/12, 2020 at 13:51 Comment(0)

Recommended topics

Hot tags