AMD has always in their manual described their implementation of LFENCE
as a load serializing instruction
Acts as a barrier to force strong memory ordering (serialization)
between load instructions preceding the LFENCE and load instructions
that follow the LFENCE.
The original use case for LFENCE
was ordering WC memory type loads. However, after the speculative execution vulnerabilities were discovered, AMD released a document in January 2018 entitled "Software techniques for managing speculation on AMD processors". This is the first and only document in which MSR C001_1029[1] is mentioned (other bits of C001_1029 are discussed in some AMD documents, but not bit 1). When C001_1029[1] is set to 1, LFENCE
behaves as a dispatch serializing instruction (which is more expensive than merely load serializing). Since this MSR is available on most older AMD processors, it seems that it has almost always been supported. Maybe because they thought they might need in the future to maintain compatibility with Intel processors regarding the behavior of LFENCE
.
There are exceptions to the ordering rules of fence instructions and serializing instructions and instructions that have serializing properties. These exceptions are subtly different between Intel and AMD processors. An example that I can think of right now is the CLFLUSH
instruction. So AMD and Intel mean slightly different things when they talk about instructions with serializing properties.
One thing not clear to me is the following part of the quote from the document:
AMD family 0Fh/11h processors support LFENCE as serializing always but
do not support this MSR.
This statement is vague because it doesn't clearly say whether LFENCE
on AMD families 0Fh and 11h is fully serializing (in AMD terminology) or dispatch serializing (in AMD terminology). But it's most probably dispatch serializing only. The AMD family-specific manuals don't mention LFENCE
or MSR C001_1029.
Since the Linux kernel v4.15-rc8, the serializing properties of LFENCE
on AMD processors are used. The change consists of two commits 1 and 2. The following macros were defined in commit 1:
+#define MSR_F10H_DECFG 0xc0011029
+#define MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT 1
The first macro specifies the MSR address and the second specifies the offset. The following code was added in init_amd
(some comments are mine) in commit 2:
/* LFENCE always requires SSE2 */
if (cpu_has(c, X86_FEATURE_XMM2)) {
unsigned long long val;
int ret;
/* The AMD CPU supports LFENCE, but there are three cases to be considered:
* 1- MSR C001_1029[1] must be set to enable the dispatch
* serializing behavior of LFENCE. This can only be done
* if and only if the MSR is supported.
* 2- The MSR is not supported (AMD 0Fh/11h). LFENCE is by
* default at least dispatch serializing. Nothing needs to
* be done.
* 3- The MSR is supported, but we are running under a hypervisor
* that does not support writing that MSR (because perhaps
* the hypervisor has not been updated yet). In this case, resort
* to the slower MFENCE for serializing RDTSC and use a Spectre
* mitigation that does not require LFENCE (i.e., generic retpoline).
/*
* A serializing LFENCE has less overhead than MFENCE, so
* use it for execution serialization. On families which
* don't have that MSR, LFENCE is already serializing.
* msr_set_bit() uses the safe accessors, too, even if the MSR
* is not present.
*/
msr_set_bit(MSR_F10H_DECFG,
MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
/*
* Verify that the MSR write was successful (could be running
* under a hypervisor) and only then assume that LFENCE is
* serializing.
*/
ret = rdmsrl_safe(MSR_F10H_DECFG, &val);
if (!ret && (val & MSR_F10H_DECFG_LFENCE_SERIALIZE)) {
/* A serializing LFENCE stops RDTSC speculation */
set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
/* X86_FEATURE_LFENCE_RDTSC is used later to choose a Spectre
mitigation */
} else {
/* MFENCE stops RDTSC speculation */
set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
}
}
Since v5.4-rc1, the MSR write verification code was removed. So the code became:
msr_set_bit(MSR_F10H_DECFG,
MSR_F10H_DECFG_LFENCE_SERIALIZE_BIT);
set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
The reasoning behind this change is discussed in the commit message. (In summary, it's mostly not needed, and it may not work.)
That document also says:
All AMD family 10h/12h/14h/15h/16h/17h processors support this MSR.
LFENCE support is indicated by CPUID function1 EDX bit 26, SSE2. AMD
family 0Fh/11h processors support LFENCE as serializing always but do
not support this MSR.
But it appears that none of the AMD manuals have been updated yet to mention support for C001_1029[1].
AMD said the following in that document:
AMD plans support for this MSR and access to this bit for all future
processors.
This means that C001_1029[1] should be considered as architectural on future AMD processors (with respect to January 2018).
lfence
isn't "serializing" on Intel. That term has a technical meaning that includes fully flushing the store buffer. e.g.cpuid
andiret
are serializing.lfence
only serializes the instruction stream / out-of-order core, not the whole pipeline including the store buffer. I usually say it's "partially serializing" or something. – Lightercpuid
, but they also use serializing alone for other things, including things which are not serializing instructions. The sentence in the lfence section directly that precedes the one I quoted uses the term "serializing operation" in reference tolfence
. – Cantrellisa
tag and adding thememory-barriers
tag, which is more pertinent. – Weikert[memory-barriers]
because we're not interested in the memory-barrier effect oflfence
. We know it does that, and it's a red-herring that distracts from this question about its other effect. I don't insist on removing it again if you and @Bee don't find that argument convincing, though. – Lighterlfence
is at least presented as a memory barrier, and is a memory barrier, and this OoO-blocking side effect is actually a result of the implementation design for its original primary function. If you were interesting inlfence
as a barrier, it is highly likely that you care about performance and also perhaps care about this OoO blocking behavior. Take the contrary position: you mentionlfence
OoO behavior almost every time the instruction comes up in the context ... – Cantrelllfence
, which is why it makes sense to mention it any time it comes up in a memory-barrier context, but arguably not the reverse. i.e. I'm correcting the misconception thatlfence
is useful as a memory barrier. But I guess you're right, maybe people doing a tag search on[x86] [memory-barriers]
would find this question and learn something. I still liked my title edit even though you've convinced me on the tags, but it's your question. – LighterLFENCE
instruction (which was previously only supposed to be a load barrier) got redefined as a "speculative execution barrier" in 2018. – Insurerlfence
may or may not be serializing depending on the value set in an MSR. – Cantrell