x86: Are memory barriers needed here?
Asked Answered
I

1

1

In WB-memory, a = b = 0

P1:
a = 1
SFENCE
b = 1

P2:
WHILE (b == 0) {}
LFENCE
ASSERT (a == 0)

It is my understanding, that neither the SFENCE or LFENCE are needed here.

Namely, since, for this memory type, x86 ensures:

  1. Reads cant be reordered with older reads
  2. Stores cant be reordered with older stores
  3. Stores are transitively visible
Id answered 21/4, 2018 at 15:36 Comment(5)
Yes, they are not needed. Duplicate. Although without SFENCE, there is no particular guarantee regarding when the processor may decide to make the writes visible. This is mostly not an issue in practice.Frenchy
@Kay: Is that pseudo-code for assembly operations, and thus compile-time reordering is not possible? In C you need a compiler barrier, but for x86 no asm barrier instructions are needed here. lfence and sfence asm instructions are no-ops unless you're using NT stores (or NT loads from WC memory, e.g. video RAM).Galliwasp
@HadiBrais: sfence isn't going to make the stores visible sooner, AFAIK. It might make the CPU wait while the store buffer drains (but probably only mfence does that, to stop StoreLoad reordering). Stores that were already in the buffer are already committing as fast as possible.Galliwasp
@PeterCordes You're right. That statement only holds when the stores are to the same location.Frenchy
@HadiBrais: Oh right, we have some evidence that consecutive stores to the same line are merged in the store buffer. IDK if it would delay commit of the first one, though, if it was at the head of the store buffer. I'd guess that repeated stores in a loop to the same line would still commit to L1d regularly.Galliwasp
G
2

The lfence and sfence asm instructions are no-ops unless you're using NT stores (or NT loads from WC memory, e.g. video RAM). (Actually, movntdqa loads might only be ordered by mfence on paper, not lfence. In which case I don't know when you'd ever use lfence. It was added to the ISA along with sfence + mfence at the same time as NT stores, before movntdqa, possibly just for completeness / in case it was ever needed.)

There is sometimes confusion around this point, because the C/C++ intrinsics for lfence and sfence are also compiler barriers. That is needed in C/C++, but can be had more cheaply with GNU C asm("":::"memory"); or (to order relaxed-atomic operations1) std::atomic_signal_fence(std::memory_order_acq_rel). Restricts compile-time reordering without making the compiler emit any useless asm barrier instructions.


Run-time reordering is already blocked by the x86 memory model, except for StoreLoad reordering which requires mfence to block. lfence + sfence don't add up to mfence. See Does it make any sense instruction LFENCE in processors x86/x86_64? and various other SO Q&As about these instructions.

This is why std::atomic_thread_fence(std::memory_order_acq_rel) also compiles to zero instructions on x86, but to barriers on weakly-ordered architectures.


lfence is also a serializing instruction on Intel microarchitectures (but maybe not AMD?). It has been all along, but Intel recently made this guarantee official so Spectre mitigation techniques could safely use it instead of a much more inconvenient cpuid.


  • Footnote 1:

atomic_signal_fence on gcc may also be a compiler barrier for plain non-atomic variables; it was last time I checked with gcc (while atomic_thread_fence wasn't), but this is probably just an implementation detail when there aren't any atomic variables involved. When there are atomic variables, the compiler knows that those variables may provide ordering that lets other threads access non-atomic variables without UB, so ordering is needed.

Galliwasp answered 21/4, 2018 at 18:20 Comment(1)
Hey Peter, thanks for the response. As to your comment, yes, this is assembly pseudo-code, so compiler-barriers not needed. Great - I'm glad I'm getting the hang of things. As always, thanks for your time.Id

© 2022 - 2024 — McMap. All rights reserved.