sfence
ensures that all earlier stores in program order become globally observable before any later stores in program order become globally observable. Stores here include data store uops, clflush
, clflushopt
, clwb
, movdiri
, and movdir64b
.
The point of GO depends on all of the following:
- the type of operation,
- the presence of the non-temporal hint,
- the memory type of the target memory location,
- the device mapped to the target memory address, and
- the microarchitecture.
For example, on a modern Intel server processor, a normal data store uop without the NT hint targeting a memory location of type WB mapped to main memory reaches GO when the target cache line is fetched from memory if not already present in the L1D in a suitable coherence state and the store is committed to the cache. That's why on an Asynchronous DRAM Refresh (ADR) platform such as Intel CSX, sfence
by itself doesn't guarantee persistence.
Regarding the specific example you're asking about, movnti
is a data store instruction with the NT hint. Assuming that the target address is mapped to main memory on an ADR platform, the point of global observability of this instruction is the same as the first point of the persistence domain. Therefore, on any Intel or AMD platform with NVDIMMs and regardless of the memory type, the data is guaranteed to be in the persistence domain before any later stores become persistent. This is a stronger guarantee than what you said (that sfence
prevents later stores from committing before earlier stores) because commit doesn't imply persistence, but persistence can only happen after commit. Although it may be better here to use the term "retire" instead of "commit" because "retire" is meaningful architecturally and indicates changing the thread's state but "commit" is a microarchitectural operation and depends on the design.