I have read about std::memory_order
in C++ and understood partially. But I still had some doubts around it.
- Explanation on
std::memory_order_acquire
says that, no reads or writes in the current thread can be reordered before this load. Does that mean compiler and cpu is not allowed to move any instruction present below theacquire
statement, above it?
auto y = x.load(std::memory_order_acquire);
z = a; // is it leagal to execute loading of shared `b` above acquire? (I feel no)
b = 2; // is it leagal to execute storing of shared `a` above acquire? (I feel yes)
I can reason out why it is illegal for executing loads before acquire
. But why it is illegal for stores?
- Is it illegal to skip a useless load or store from
atomic
objects? Since they are notvolatile
, and as I know only volatile has this requirement.
auto y = x.load(std::memory_order_acquire); // `y` is never used
return;
This optimization is not happening even with relaxed
memory order.
- Is compiler allowed to move instructions present above
acquire
statement, below it?
z = a; // is it leagal to execute loading of shared `b` below acquire? (I feel yes)
b = 2; // is it leagal to execute storing of shared `a` below acquire? (I feel yes)
auto y = x.load(std::memory_order_acquire);
- Can two loads or stores be reordered without crossing
acquire
boundary?
auto y = x.load(std::memory_order_acquire);
a = p; // can this move below the below line?
b = q; // shared `a` and `b`
Similar and corresponding 4 doubts with release
semantics also.
Related to 2nd and 3rd question, why no compiler is optimizing f()
, as aggressive as g()
in below code?
#include <atomic>
int a, b;
void dummy(int*);
void f(std::atomic<int> &x) {
int z;
z = a; // loading shared `a` before acquire
b = 2; // storing shared `b` before acquire
auto y = x.load(std::memory_order_acquire);
z = a; // loading shared `a` after acquire
b = 2; // storing shared `b` after acquire
dummy(&z);
}
void g(int &x) {
int z;
z = a;
b = 2;
auto y = x;
z = a;
b = 2;
dummy(&z);
}
f(std::atomic<int>&):
sub rsp, 24
mov eax, DWORD PTR a[rip]
mov DWORD PTR b[rip], 2
mov DWORD PTR [rsp+12], eax
mov eax, DWORD PTR [rdi]
lea rdi, [rsp+12]
mov DWORD PTR b[rip], 2
mov eax, DWORD PTR a[rip]
mov DWORD PTR [rsp+12], eax
call dummy(int*)
add rsp, 24
ret
g(int&):
sub rsp, 24
mov eax, DWORD PTR a[rip]
mov DWORD PTR b[rip], 2
lea rdi, [rsp+12]
mov DWORD PTR [rsp+12], eax
call dummy(int*)
add rsp, 24
ret
b:
.zero 4
a:
.zero 4
seq_cst
loads/stores, but you could have cases where another thread is done writing something, and signals that with a release-store to a flag. – Paulusatomic
very much likevolatile
, pending further work on compilers and designing ways to prevent optimizations that could be problematic for timing: Why don't compilers merge redundant std::atomic writes? – Paulus