c++ atomic: would function call act as memory barrier?

Asked 16/11, 2016 at 20:41 Answered 8/12, 2019 at 4:25

I'm reading this article Memory Ordering at Compile Time from which said:

In fact, the majority of function calls act as compiler barriers, whether they contain their own compiler barrier or not.This excludes inline functions, functions declared with the pure attribute, and cases where link-time code generation is used. Other than those cases, a call to an external function is even stronger than a compiler barrier, since the compiler has no idea what the function’s side effects will be.

Is this a true statement? Think about this sample -

std::atomic_bool flag = false;
int value = 0;

void th1 () { // running in thread 1
  value = 1;
  // use atomic & release to prevent above sentence being reordered below
  flag.store(true, std::memory_order_release);
}

void th2 () { // running in thread 2
  // use atomic & acquire to prevent asset(..) being reordered above
  while (!flag.load(std::memory_order_acquire)) {}
  assert (value == 1);    // should never fail!
}

Then we can remove atomic but replace with function call -

bool flag = false;
int value = 0;

void writeflag () {
  flag = true;
}
void readflag () {
  while (!flag) {}
}
void th1 () {
  value = 1;
  writeflag(); // would function call prevent reordering?
}
void th2 () {
  readflag();  // would function call prevent reordering?
  assert (value == 1);    // would this fail???
}

Any idea?

Insult answered 16/11, 2016 at 20:41 Comment(2)

No such thing is true; your proposed code has a race. – Satang 16/11, 2016 at 20:45

No, no, no ! you cannot read and write from/to the flag at the same time form 2 or more threads. It is not guaranteed that the operations on a bool are atomic by default. You need the flag to be atomic. – Dillingham 16/11, 2016 at 20:47

A compiler barrier is not the same thing as a memory barrier. A compiler barrier prevents the compiler from moving code across the barrier. A memory barrier (loosely speaking) prevents the hardware from moving reads and writes across the barrier. For atomics you need both, and you also need to ensure that values don't get torn when read or written.

Catlee answered 16/11, 2016 at 21:17 Comment(3)

So atomic and memory order option are always necessary in the above case no matter the read/write is in a separate function call? The article is misleading a bit. – Insult 17/11, 2016 at 21:5

A memory barrier like std::atomic_thread_fence(std::memory_order_seq_cst) or x86 _mm_mfence() also includes a compiler barrier even if it's an intrinsic / inline function, otherwise it's not very usable. – Chamkis 8/12, 2019 at 7:37

Atomic with std::memory_order_relaxed doesn't need any barriers, only the no-tearing / access-once atomicity guarantee. – Chamkis 8/12, 2019 at 7:49

Formally, no, if only because Link-Time Code Generation is a valid implementation choice and need not be optional.

There's also a second oversight, and that's escape analysis. The claim is that "the compiler has no idea what the function’s side effects will be.", but if no pointers to my local variables escape from my function, then the compiler does know for sure that no other function changes them.

Erlineerlinna answered 16/11, 2016 at 21:4 Comment(2)

No only that: a computation whose result is dropped and simply to removed. (Even in Ada. Even it would have caused an exception in Ada.) So you can't reliably time patently useless computations in benchmarks. Some computations aren't just reordered around "compiler barriers", they can potentially be removed. Which makes the whole question of whether they can "jump" the barrier very silly as they can "jump" into a black hole. – Tetartohedral 8/12, 2019 at 4:19

In some compilers (like GCC), compiler barriers like asm("":::"memory") don't affect non-escaped locals. Exactly because nothing could tell the difference. If you need them barriered, make them "m"(var) inputs to the asm statement for force them into memory. – Chamkis 8/12, 2019 at 7:46

In the second example, even if we assume that no reordering of any kind, the behavior is undefined.

The writes and reads from variable flag are not atomic, and there is a race condition¹. Having no reordering doesn't guarantee that both threads don't access the variable flat at the same time. This happens when one thread hits the while loop in the function readflag and reads flag, and the other thread writes to flag in writeflag.

¹ (Quoted from: ISO/IEC 14882:2011(E) 1.10 Multi-threaded executions and data races 21)
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior

Architectonics answered 17/11, 2016 at 7:37 Comment(0)

You are confusing a memory barrier used for inter thread memory visibility and a compiler barrier, which isn't a thread device, just a device (or trick) to prevent reordering of side effects by the compiler.

You need a memory barrier for your threading example.

You can use a compiler barrier to ensure that memory side effet are performed in a given order (on the local CPU) for other purposes, like benchmarking, getting around a type aliasing violation, integrating assembly code, or signal handling (for a signal only handled in that same thread).

Tetartohedral answered 8/12, 2019 at 4:25 Comment(0)

Recommended topics

Hot tags