On x86-64, is the “movnti” or "movntdq" instruction atomic when system crash?
Asked Answered
P

1

5

When using persistent memory like Intel optane DCPMM, is it possible to see partial result after reboot if system crash(power outage) in execution of movnt instruction?

For:

  • 4 or 8 byte movnti which x86 guarantees atomic for other purposes?
  • 16-byte SSE movntdq / movntps which aren't guaranteed atomic but which in practice probably are on CPUs supporting persistent memory.
  • 32-byte AVX vmovntdq / vmovntps
  • 64-byte AVX512 vmovntdq / vmovntps full-line stores
  • bonus question: MOVDIR64B which has guaranteed 64-byte write atomicity, on future CPUs that support it and DC-PM. e.g. Sapphire Rapids Xeon / Tiger Lake / Tremont.

movntpd is assumed to be identical to movntps.


Related questions:

Profession answered 4/1, 2021 at 5:37 Comment(4)
@Peter Cordes Very thanks to your professional editing and answers!Profession
Despite clflush itself apparently being atomic, it's still true that it doesn't give any guarantee of gluing together two separate stores into one atomic persistence; one could still commit to persistence before clflush, and then the system crashes. So my commentary on that linked question (which this is a followup to) is still somewhat accurate and relevant: it doesn't work like that when the goal is to atomically write stuff to persistent storage.Farce
@Peter Cordes Do you mean that the former write may become persistent before clflush because cache line eviction or something else?Two separate stores can't be persistent atomically, but the order of their persistence will not change, right?Profession
Oh right, I forgot ordering, not atomicity, was your real concern. If split or out-of-order write-backs within a line are impossible (whether by clflush or other means, e.g. interrupt after both stores but before clflush, leading to eviction), then yeah global observability order should apply to persistence order for writes within the same cache line. That's what I expected would be the case, but documentation left open the possibility of reordering. Fortunately Hadi got confirmation that reality matches expectations.Farce
W
4

The following operations are guaranteed to be persistently atomic:

  • A store uop that doesn't cross an 8-byte boundary to a location of any effective memory type, and
  • MOVDIR64B.

Note all atomic guarantees mentioned in the Intel SDM V3 Section 8.1.1 apply to persistent memory.

In addition, the following operations are persistently atomic:

There is no architectural persistent atomicity guarantee for everything else, including 64-byte AVX512 vmovntdq / vmovntps full-line stores.

These guarantees apply to Asynchronous DRAM Refresh (ADR) platforms and Enhanced Asynchronous DRAM Refresh (eADR) platforms. (On eADR, the cache hierarchy is in the persistence domain. See: Build Persistent Memory Applications with Reliability Availability and Serviceability.)

This answer is based on my private correspondence with Andy Rudoff (Intel).

Wellknit answered 5/1, 2021 at 22:10 Comment(2)
I would like to clarify Hadi's answer where it says a cache line flush is persistently atomic. Hadi is correct that the cache line is the unit for communicating with pmem on x86, but if you're using multiple instructions to set the values in the cache line, an eviction could happen at any time that makes only part of your update persistent. Without using something like TSX, nothing forces the cache line to remain unwritten until you're ready to write it. Not until MOVDIR64B is available can you actually persistent 64 bytes persistently with a single instruction.Velar
@Velar I didn't know you had an SO account! Thanks for the clarification. I think that even for a single instruction, if that instruction is decoded into multiple store uops, then these stores are only atomic with respect to retirement, but not global observability or persistence. That's why I used the term "store uop" in my answer instead of using something ambiguous like "store."Wellknit

© 2022 - 2024 — McMap. All rights reserved.