Is the following code guaranteed to return the expected value of counter (40,000,000), according to the C++11 memory model? (NOT limited to x86).
#include <atomic>
#include <thread>
using namespace std;
void ThreadProc(atomic<int>& counter)
{
for (int i = 0; i < 10000000; i++)
counter.fetch_add(1, memory_order_relaxed);
}
int main()
{
#define COUNT 4
atomic<int> counter = { 0 };
thread threads[COUNT] = {};
for (size_t i = 0; i < COUNT; i++)
threads[i] = thread(ThreadProc, ref(counter));
for (size_t i = 0; i < COUNT; i++)
threads[i].join();
printf("Counter: %i", counter.load(memory_order_relaxed));
return 0;
}
In particular, will relaxed atomics coordinate such that two threads will not simultaneously read the current value, independently increment it, and both write their incremented value, effectively losing one of the writes?
Some lines from the spec seem to indicate that counter must consistently be 40,000,000 in the above example.
[Note: operations specifying memory_order_relaxed are relaxed with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object. — end note
.
Atomic read-modify-write operations shall always read the last value (in the modification order) written the write associated with the read-modify-write operation.
.
All modifications to a particular atomic object M occur in some particular total order, called the modification order of M. If A and B are modifications of an atomic object M and A happens before (as defined below) B, then A shall precede B in the modification order of M, which is defined below.
This talk also supports the notion that the above code is race free. https://www.youtube.com/watch?v=KeLBd2EJLOU&feature=youtu.be&t=1h9m30s
It appears to me that there is an indivisible ordering of the atomic operations, but we have no guarantees what the order is. So all increments must take place 'one before the other' without the race I described above.
But then a few things potentially point in the other direction:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
I've been informed by a coworker that there are known mistakes in Sutter's talk. Though I've yet to find any sources for this.
Multiple members of the C++ community smarter than I have implied that a relaxed atomic add could be buffered such that a subsequent relaxed atomic add could read and operator on the stale value.
refs.fetch_sub(1, ...) == 0
; it should be== 1
, since the fetch happens before the subtraction (like__sync_sub_and_fetch
, and unlike__sync_fetch_and_sub
orInterlockedDecrement
). – Speck