I am struggling with Section 5.1.2.4 of the C11 Standard, in particular the semantics of Release/Acquire. I note that https://preshing.com/20120913/acquire-and-release-semantics/ (amongst others) states that:
... Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order.
So, for the following:
typedef struct test_struct
{
_Atomic(bool) ready ;
int v1 ;
int v2 ;
} test_struct_t ;
extern void
test_init(test_struct_t* ts, int v1, int v2)
{
ts->v1 = v1 ;
ts->v2 = v2 ;
atomic_store_explicit(&ts->ready, false, memory_order_release) ;
}
extern int
test_thread_1(test_struct_t* ts, int v2)
{
int v1 ;
while (atomic_load_explicit(&ts->ready, memory_order_acquire)) ;
ts->v2 = v2 ; // expect read to happen before store/release
v1 = ts->v1 ; // expect write to happen before store/release
atomic_store_explicit(&ts->ready, true, memory_order_release) ;
return v1 ;
}
extern int
test_thread_2(test_struct_t* ts, int v1)
{
int v2 ;
while (!atomic_load_explicit(&ts->ready, memory_order_acquire)) ;
ts->v1 = v1 ;
v2 = ts->v2 ; // expect write to happen after store/release in thread "1"
atomic_store_explicit(&ts->ready, false, memory_order_release) ;
return v2 ;
}
where those are executed:
> in the "main" thread: test_struct_t ts ;
> test_init(&ts, 1, 2) ;
> start thread "2" which does: r2 = test_thread_2(&ts, 3) ;
> start thread "1" which does: r1 = test_thread_1(&ts, 4) ;
I would, therefore, expect thread "1" to have r1 == 1 and thread "2" to have r2 = 4.
I would expect that because (following paras 16 and 18 of sect 5.1.2.4):
- all the (not atomic) reads and writes are "sequenced before" and hence "happen before" the atomic write/release in thread "1",
- which "inter-thread-happens-before" the atomic read/acquire in thread "2" (when it reads 'true'),
- which in turn is "sequenced before" and hence "happens before" the (not atomic) reads and writes (in thread "2").
However, it is entirely possible that I have failed to understand the standard.
I observe that the code generated for x86_64 includes:
test_thread_1:
movzbl (%rdi),%eax -- atomic_load_explicit(&ts->ready, memory_order_acquire)
test $0x1,%al
jne <test_thread_1> -- while is true
mov %esi,0x8(%rdi) -- (W1) ts->v2 = v2
mov 0x4(%rdi),%eax -- (R1) v1 = ts->v1
movb $0x1,(%rdi) -- (X1) atomic_store_explicit(&ts->ready, true, memory_order_release)
retq
test_thread_2:
movzbl (%rdi),%eax -- atomic_load_explicit(&ts->ready, memory_order_acquire)
test $0x1,%al
je <test_thread_2> -- while is false
mov %esi,0x4(%rdi) -- (W2) ts->v1 = v1
mov 0x8(%rdi),%eax -- (R2) v2 = ts->v2
movb $0x0,(%rdi) -- (X2) atomic_store_explicit(&ts->ready, false, memory_order_release)
retq
And provided that R1 and X1 happen in that order, this gives the result I expect.
But my understanding of x86_64 is that reads happen in order with other reads and writes happen in order with other writes, but reads and writes may not happen in order with each other. Which implies it is possible for X1 to happen before R1, and even for X1, X2, W2, R1 to happen in that order -- I believe. [This seems desperately unlikely, but if R1 were held up by some cache issues ?]
Please: what am I not understanding?
I note that if I change the loads/stores of ts->ready
to memory_order_seq_cst
, the code generated for the stores is:
xchg %cl,(%rdi)
which is consistent with my understanding of x86_64 and will give the result I expect.
8.2.3.3 Stores Are Not Reordered With Earlier Loads
. So your compiler is correctly translating your code (how surprising), such that your code is effectively completely sequential and nothing interesting happens concurrently. – Agriculturist