Assume there are two threads running on x86 CPU0 and CPU1 respectively. Thread running on CPU0 executes the following commands:
A=1
B=1
Cache line containing A initially owned by CPU1 and that containing B owned by CPU0.
I have two questions:
If I understand correctly, both stores will be put into CPU’s store buffer. However, for the first store
A=1
the cache of CPU1 must be invalidated while the second storeB=1
can be flushed immediately since CPU0 owns the cache line containing it. I know that x86 CPU respects store orders. Does that mean thatB=1
will not be written to the cache beforeA=1
?Assume in CPU1 the following commands are executed:
while (B=0);
print A
Is it enough to add only lfence between the while
and print
commands in CPU1 without adding a sfence between A=1
and B=1
in CPU0 to get 1 always printed out on x86?
while (B=0);
lfence
print A
LFENCE
here is not needed on x86 - it provides acquire consistency automatically. Note that the x86 CPU can't reorderload
and any next instructions, but C/C++ can reorder it. On C++ you shold use acquire consistency:extern std::atomic<int> B;
while( B.load(std::memory_order_acquire) == 0 );
std::cout << A;
en.cppreference.com/w/cpp/atomic/memory_order – Ptahvolatile
a compiler may be allowed to convertwhile (B==0)
towhile (true)
because as far as the compiler sees, nothing can change value of B within that loop. For example, C/C++ compilers are allowed to do this with high optimization levels. – Heifetz