The notion of a compiler fence often comes up when I'm reading about memory models, barriers, ordering, atomics, etc., but normally it's in the context of also being paired with a CPU fence, as one would expect.
Occasionally, however, I read about fence constructs which only apply to the compiler. An example of this is the C++11 std::atomic_signal_fence
function, which states at cppreference.com:
std::atomic_signal_fence is equivalent to std::atomic_thread_fence, except no CPU instructions for memory ordering are issued. Only reordering of the instructions by the compiler is suppressed as order instructs.
I have five questions related to this topic:
As implied by the name
std::atomic_signal_fence
, is an asynchronous interrupt (such as a thread being preempted by the kernel to execute a signal handler) the only case in which a compiler-only fence is useful?Does its usefulness apply to all architectures, including strongly-ordered ones such as
x86
?Can a specific example be provided to demonstrate the usefulness of a compiler-only fence?
When using
std::atomic_signal_fence
, is there any difference between usingacq_rel
andseq_cst
ordering? (I would expect it to make no difference.)This question might be covered by the first question, but I'm curious enough to ask specifically about it anyway: Is it ever necessary to use fences with
thread_local
accesses? (If it ever would be, I would expect compiler-only fences such asatomic_signal_fence
to be the tool of choice.)
Thank you.
atomic_signal_fence
(or some other compiler-only fence construct) could be used as a potential optimization. As the article states, the Linux kernel has functionssmp_rmb
andsmp_wmb
which are implemented this way. However, I'm still interested in hearing answer(s) -- if any exist -- that are not restricted to such an assumption. – Pechora