Is a read on an atomic variable guaranteed to acquire the current value of it in C++11?

Asked 13/8, 2014 at 15:35 Answered 19/4, 2017 at 21:43

Solved memory c++11 synchronization atomic

It is known that the modifications on a single atomic variable form a total order. Suppose we have an atomic read operation on some atomic variable v at wall-clock time T. Then, is this read guaranteed to acquire the current value of v that is wrote by the last one in the modification order of v at time T? To put it in another way, if an atomic write is done before an atomic read in natural time, and there is no other writes in between, then is the read guaranteed to return the value just written?

My accepted answer is the 6th comment made by Cubbi to his answer.

Garold answered 13/8, 2014 at 15:35 Comment(6)

isn't that the whole point of being atomic? – Rivalee 13/8, 2014 at 15:59

After studying the complexities that may be involved in memory order, it does not seem apparent to me now. – Garold 13/8, 2014 at 18:23

@Rivalee Isn't the whole point of parallel MT to not have a global synchronous time? – Whaleback 20/11, 2019 at 8:12

@Whaleback the 'whole' point I wouldn't say but AFAICT the OP used the concept of wall time just to express the concept of before/after – Rivalee 20/11, 2019 at 8:51

@Rivalee Before/after is fully defined only on a given thread (and signals in that thread). Among threads there are different very abstract orders that must match with threads and sometimes with each others... – Whaleback 20/11, 2019 at 9:21

Related: Is a memory barrier required to read a value that is atomically modified? / Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees? (no, and RMWs aren't faster either). This part of the standard is how the formalism guarantees atomicity, that another write can't happen on this variable between the load and store. Nothing more, nothing less. – Emmalynne 10/9 at 4:10

Wall-clock time is irrelevant. However, what you're describing sounds like the write-read coherence guarantee:

$1.10[intro.multithread]/20

If a side effect X on an atomic object M happens before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M.

(translating the standardese, "value computation" is a read, and "side effect" is a write)

In particular, if your relaxed write and your relaxed read are in different statements of the same function, they are connected by a sequenced-before relationship, therefore they are connected by a happens-before relationship, therefore the guarantee holds.

Oasis answered 13/8, 2014 at 16:4 Comment(11)

My question lies in that a write inter-thread happens-before a read when the value computation reads the value of the write. But is there any conditions that when satisfied this actually happens (otherwise, this may never happen)? I am considering whether happens-before in terms of wall-clock time is such a condition. – Garold 13/8, 2014 at 17:53

The necessary condition is the definition of happens-before. It does not involve wall clock. – Oasis 13/8, 2014 at 17:58

I think the following example may make my question clear. Consider two threads synchronized by an atomic spin-lock. Thread A first acquires the lock. Thread B is then busy waiting. Next, Thread A releases the lock. Is there any clue about when thread B sees this release by thread A and thus enters the critical section, or it may just be waiting infinitely long to see this release? – Garold 13/8, 2014 at 18:16

@Lingxi, the clue is memory order semantics. The store in A has to be memory_order_release at least, the load has to be memory_order_acquire – Cerebration 13/8, 2014 at 18:21

@Cerebration I agree with you in the memory order semantics that must be used, and the memory_order_release write synchronizes-with the memory_order_acquire read when it reads the value written by that write. But again, when does this happen? – Garold 13/8, 2014 at 18:29

@Garold 29.3[atomics.order]/12 "Implementations should make atomic stores visible to atomic loads within a reasonable amount of time." – Oasis 13/8, 2014 at 18:39

@Oasis One way to force the read of the newest current value seems to be using an atomic read-modify-write operation. See this stackoverflow QA link. – Garold 13/8, 2014 at 18:59

@Garold sure, and that's why RMWs take so long (CPU has to sit and wait for all the acks). I am not sure I see the relationship with the original question, though: RMW is a different beast from atomic read or atomic write. – Oasis 13/8, 2014 at 19:33

@Oasis "CPU has to sit and wait for all the acks" No. There is no such thing as wait for acks. Acks are not a thing you wait for or that propagates. The cache does the same job for a simple write. And reads from the cache are always current. The only specific issue w/ RMW is to keep the data in the cache during the whole operation. – Whaleback 20/11, 2019 at 8:42

@Garold "force the read of the newest current value seems to be using an atomic read-modify-write operation" There is no language semantics basis for that claim. – Whaleback 20/11, 2019 at 8:43

@Whaleback here's a good read on what ACKs go where: researchspace.auckland.ac.nz/bitstream/handle/2292/11594/…? – Oasis 21/11, 2019 at 13:47

Is a read on an atomic variable guaranteed to acquire the current value of it

Even though each atomic variable has a single modification order (which is observed by all threads), that does not mean that all threads observe modifications at the same time scale.

Consider this code:

std::atomic<int> g{0};

// thread 1
g.store(42);

// thread 2
int a = g.load();
// do stuff with a
int b = g.load();

A possible outcome is (see diagram):

thread 1: 42 is stored at time T1
thread 2: the first load returns 0 at time T2
thread 2: the store from thread 1 becomes visible at time T3
thread 2: the second load returns 42 at time T4.

This outcome is possible even though the first load at time T2 occurs after the store at T1 (in clock time).

The standard says:

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

It does not require a store to become visible right away and it even allows room for a store to remain invisible (e.g. on systems without cache-coherency). In that case, an atomic read-modify-write (RMW) is required to access the last value.

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

Needless to say, RMW's are more expensive to execute (they lock the bus) and that is why a regular atomic load is allowed to return an older (cached) value.
If a regular load was required to return the last value, performance would be horrible while there would be hardly any benefit.

Gustavogustavus answered 19/4, 2017 at 21:43 Comment(6)

In which practical case can a regular load returns an old obsolete value? Where was that old value stored? – Whaleback 30/5, 2019 at 4:50

@Whaleback in the cache, if it is not coherent.. You can also consider a load obsolete if a stored value (other core) is still in the store buffer (X86).. Problem is that there is no definition of current or latest value for regular stores and loads. Latest in the modification order really only applies to RMW operations – Gustavogustavus 30/5, 2019 at 5:25

Which modern architecture (that has some form of C++ or C++-like compiler) has a non coherent cache? – Whaleback 30/5, 2019 at 5:29

@Whaleback No idea, but cache coherency is not a mandatory feature – Gustavogustavus 30/5, 2019 at 5:30

It seems to me that coherency is essential to support any common PL with reasonable efficiency... is there at least a prototype compiler of C++ on a theoretical system w/o cache coherency? – Whaleback 30/5, 2019 at 5:33

Cache coherency is a hardware feature. Platforms that do not support it can probably be found in the embedded world. – Gustavogustavus 30/5, 2019 at 5:37

Depends on the memory order which you can specify for the load() operation.

By default, it is std::memory_order_seq_cst and the answer is yes, it guarantees the current value stored by another thread (if stored at all, i.e. it must use std::memory_order_release memory order at least, otherwise the store visibility is not guaranteed).

But if you specify std::memory_order_relaxed for the load operation the documentation says Relaxed ordering: there are no synchronization or ordering constraints, only atomicity is required of this operation. I.e. the program could end up not reading from the memory at all.

Cerebration answered 13/8, 2014 at 16:5 Comment(4)

I do not think memory_order_seq_cst actually guarantees this. It does guarantee that a global total order exists, but does not say clearly how this order may be formed, i.e., this rules of constructing such an order. – Garold 13/8, 2014 at 18:1

The rules are described for the memory order. The implementation mechanism is to prevent omitting and reordering of operations before and after this atomic load (or store) in both compiler and processor sides. Thus it is guaranteed the store will happen in the right place and the load will happen in the right place among other operations; and the processor will execute these operations. – Cerebration 13/8, 2014 at 18:50

@Garold Consistency implies that the order of any atomic op (from seq_cst to relaxed) must follow execution order in a given thread (you don't go back in time); acquire and release of the same objects by multiple threads "ties" the order of operations together and force them to go along. OTOH if you don't have atomic objects shared between threads, you have no such ties. But then why would you care about the abstract order? – Whaleback 30/5, 2019 at 5:14

"It does guarantee that a global total order exists" Global order of operations inside unrelated threads that share nothing is an abstract intellectual device that has no physical counterpart. You must consider the threads that do interact via atomic objects: in practice these will be implemented with fences that are made visible in some order, and atomic stores are implemented as real memory stores that put a cache line in an exclusive modified state which determine a modification order on the objects in that cache line. Fences make these modifications globally consistent. – Whaleback 30/5, 2019 at 5:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags