No, seq_cst stores use xchg
(or mov
+ mfence
but that's slower on recent CPUs) for ordering wrt. other operations. release
or relaxed
atomic stores can just use mov
and will still be promptly visible to other cores. (Not before later loads in this thread might have executed, though.)
Cache coherence isn't the cause of memory-reordering, that's local to each core. (For x86, the memory model is program order + a store buffer with store-forwarding. It's the store buffer that causes stores to not become visible until after the store instruction has retired from out-of-order exec.)
The answer you linked which says "if I set this to true (or false), no other thread will read a different value after I've set it" (that's not quite such a certainty - you need a "lock" prefix to guarantee that). is somewhat misleading. They mean that (implicit-lock) xchg
includes a full memory barrier, so no code in the storing thread can access memory until after the store is actually committed to cache, globally visible.
A clearer way to state that is that it makes this thread wait without doing anything until the store is visible. i.e. stall this thread until the store buffer has finished committing all previous stores. That would eventually happen on its own. So it's really about ordering of this thread relative to store visibility, not other threads. Other threads (cores) can locally do their own early loading / late storing, although on x86 all loads happen in program order. That's why I commented on that answer you linked to disagree with the way it was presenting things.