What is the C++11 atomic API equivalent to ```__asm__ volatile("" ::: "memory")```
Asked Answered
T

2

11

A codebase has a COMPILER_BARRIER macro defined as __asm__ volatile("" ::: "memory"). The intent of the macro is to prevent the compiler from re-ordering reads and writes across the barrier. Note that this is explicitly a compiler barrier, and not a processor level memory barrier.

As is, this is fairly portable since there are no actual assembly instructions in the AssemblerTemplate, just the volatile and the memory clobber. So, as long as the compiler honors GCCs Extended Asm syntax, it should work fine. Still, I'm curious what the right way to express this would be in the C++11 atomics API, if possible.

The following seemed like it might be the right idea: atomic_signal_fence(memory_order_acq_rel);.

My reasoning being that:

  • Of the <atomic> APIs, only atomic_signal_fence and atomic_thread_fence do not need a memory address against which to operate.
  • atomic_thread_fence affects memory ordering, which we don't need for a compiler barrier.
  • The memory clobber in the Extended Asm version doesn't distinguish between reads and writes, so it would appear that we want both acquire and release semantics, so memory_order_acq_rel seems to be required, at minimum.
  • memory_order_seq_cst seems unnecessary, as we don't require a total order across threads - we are only interested in the instruction sequencing within the current thread.

Is it possible to express the equivalent to __asm__ volatile("" ::: "memory") entirely portably with the C++11 atomics API? If so, is atomic_signal_fence the correct API to use? If so, what memory order argument is appropriate/required here?

Or, am I off in the weeds here and there is a better way to approach this?

Thant answered 26/7, 2016 at 2:54 Comment(4)
atomic_signal_fence only guarantees ordering between a thread and a signal handler running in that the same thread. Similarly atomic_thread_fence only applies to the ordering between threads. If you're trying to guarantee ordering between two other contexts then neither is portable. For example on Windows atomic_signal_fence doesn't need to do anything because Windows doesn't support asynchronous signals.Dyer
@RossRidge - I felt a little strange about using atomic_signal_fence, because, as you point out, there are no signals around here. But it was the only thing that "worked", per my outline above. I didn't see any language in the standard though that would allow a call to atomic_signal_fence to be elided if the implementation didn't have async signals. It does state in 28.9.7 of the C++14 standard that "compiler optimizations and reorderings of loads and stores are inhibited in the same way as with atomic_thread_fence, but the hardware fence instructions ... are not emitted."Thant
That's an informative (non-normative) note, it doesn't place a constraint on the implementation. The standard doesn't provide any language that would allow you to depend on it being anything more than "equivalent to atomic_thread_fence(order), except that the resulting ordering constraints are established only between a thread and a signal handler executed in the same thread". Note also that atomic_thread_fence is defined in terms of atomic operations on atomic objects, as defined by the standard. So if you're not using the std::atomic types then neither function is guaranteed to work.Dyer
Duplicate of Is there any compiler barrier which is equal to asm("" ::: "memory") in C++11? based on the title at least.Clothe
K
3

__asm__ volatile("" ::: "memory") is not even a complete compiler barrier; it only forces ordering of loads/stores to objects whose addresses are potentially accessible to the asm block, which would not include local variables for which the compiler can track that the address does not leak. For example, memset(password, 0, len); followed by __asm__ volatile("" ::: "memory"); may fail to actually zero the memory used by password[].

This can be remedied by passing the addresses of such objects as inputs to the asm block, but I don't see any perfect equivalent with atomic_signal_fence. The closest you could probably do is storing the address of the object into an external-linkage volatile pointer object (be careful to make the pointer, not the pointed-to-type, volatile-qualified) and then atomic_signal_fence would have to assume it might be accessed from a signal handler.

Kaufmann answered 26/7, 2016 at 3:34 Comment(3)
That is a very useful observation. It leads me to wonder whether the macro actually does what was intended!Thant
@Thant and R.: observing from another thread or signal handler would be impossible without UB for objects whose addresses haven't escaped the function. That's why they don't need to be given an address and spilled across a compiler barrier. I think GCC's atomic_signal_fence is implemented in terms of asm("":::"memory"), or is at least internally equivalent inside the compiler.Clothe
Or are you just saying that atomic_signal_fence doesn't have a mechanism for passing extra addresses of locals into it? Sure, but it's still equivalent to a macro with no args for asm("":::"memory").Clothe
C
0

distinguish between reads and writes, so it would appear that we want both acquire and release semantics

You seem to be mixing up distinct issues.

Both acquire and release semantics can create a constraint on both reads and writes:

  • release informally means that previous memory operations are complete before the barrier is started
  • acquire informally means that following memory operations do not begin before the barrier is completed

That's a very simplistic explanation however. The C++ atomic barriers are barriers for atomics. They work in coordination with atomic objects. Of course the thread barrier call can produce code on its own, but that code could be reordered with some non atomic operations.

Chemosynthesis answered 17/12, 2019 at 9:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.