Why “movnti” followed by an “sfence” guarantees persistent ordering?
Asked Answered
S

1

3

SFENCE prevents NT stores from committing from the store buffer ahead of SFENCE itself.

NT store data enters an LFB directly from the store buffer.

Therefore SFENCE can only guarantees the ordering of data entering LFB.

For example,

movnti;
sfence;
movnti to another address;

The SFENCE here can only guarantees that the first NT store will be commit to LFB earlier than the next one. However, since LFB is volatile, the data has not been persisted yet. Will the data entering the LFB be persisted in the order of entering?

Spermophyte answered 21/1, 2021 at 11:3 Comment(2)
The question is unclear to me. Can you give an example code sequence to illustrate what the question is about exactly?Avirulent
@Hadi Brais I edited the question. If any prerequisite knowledge in the question is wrong, please correct me, thank you.Spermophyte
A
3

sfence ensures that all earlier stores in program order become globally observable before any later stores in program order become globally observable. Stores here include data store uops, clflush, clflushopt, clwb, movdiri, and movdir64b.

The point of GO depends on all of the following:

  • the type of operation,
  • the presence of the non-temporal hint,
  • the memory type of the target memory location,
  • the device mapped to the target memory address, and
  • the microarchitecture.

For example, on a modern Intel server processor, a normal data store uop without the NT hint targeting a memory location of type WB mapped to main memory reaches GO when the target cache line is fetched from memory if not already present in the L1D in a suitable coherence state and the store is committed to the cache. That's why on an Asynchronous DRAM Refresh (ADR) platform such as Intel CSX, sfence by itself doesn't guarantee persistence.

Regarding the specific example you're asking about, movnti is a data store instruction with the NT hint. Assuming that the target address is mapped to main memory on an ADR platform, the point of global observability of this instruction is the same as the first point of the persistence domain. Therefore, on any Intel or AMD platform with NVDIMMs and regardless of the memory type, the data is guaranteed to be in the persistence domain before any later stores become persistent. This is a stronger guarantee than what you said (that sfence prevents later stores from committing before earlier stores) because commit doesn't imply persistence, but persistence can only happen after commit. Although it may be better here to use the term "retire" instead of "commit" because "retire" is meaningful architecturally and indicates changing the thread's state but "commit" is a microarchitectural operation and depends on the design.

Avirulent answered 22/1, 2021 at 10:42 Comment(4)
You said “the data is guaranteed to be in the persistence domain by the time sfence retires”, but sfence does not force the store buffer to be drained before it retires(stackoverflow.com/questions/27627969/… Peter Cordes's answer, line 6).The store buffer is not persistence domain.Intel's ADR started from iMC(integrated memory controller), so sfence actually retires when data entering iMC?Spermophyte
@Spermophyte Some parts of that answer you linked are not accurate. sfence does drain the store buffer and WCBs. See the Intel manual Sections 11.10 and 11.3.Avirulent
Ugh, I guess I need to fix my assumptions / mental model and rework that linked answer, then. Any suggestions? (If so, please comment on that answer, or make an edit if you want and have time).Synthiasyntonic
Thanks for your edit on the linked answer. So it seems the section 11 stuff saying it drains the store buffer and WCBs apparently means "before the next store can become visible", not tied to retirement of SFENCE (at least not on paper for Intel). And my mental model of sfence as a divider on a conveyor belt (store buffer) isn't crazy after all, and that could well be the real mechanism for the experimental test in the other answer showing StoreLoad reordering across SFENCE+LFENCE. That's a relief.Synthiasyntonic

© 2022 - 2024 — McMap. All rights reserved.