In Paul McKenny's famous paper "Memory Barriers: A Hardware View for Software Hackers"
3.3 Store Buffers and Memory Barriers
To see the second complication, a violation of global memory ordering, consider the following code sequences with variables “a” and “b” initially zero:
1 void foo(void) 2 { 3 a = 1; 4 b = 1; 5 } 6 7 void bar(void) 8 { 9 while (b == 0) continue; 10 assert(a == 1); 11 }
Suppose CPU 0 executes foo() and CPU 1 executes bar(). Suppose further that the cache line containing “a” resides only in CPU 1’s cache, and that the cache line containing “b” is owned by CPU 0. Then the sequence of operations might be as follows:
CPU 0 executes a=1. The cache line is not in CPU 0’s cache, so CPU 0 places the new value of “a” in its store buffer and transmits a “read invalidate” message.
CPU 1 executes while(b==0)continue, but the cache line containing “b” is not in its cache. It therefore transmits a “read” message.
CPU 0 executes b=1. It already owns this cache line (in other words, the cache line is already in either the “modified” or the “exclusive” state), so it stores the new value of “b” in its cache line.
CPU 0 receives the “read” message, and transmits the cache line containing the now-updated value of “b” to CPU 1, also marking the line as “shared” in its own cache.
CPU 1 receives the cache line containing “b” and installs it in its cache.
CPU 1 can now finish executing while(b==0) continue, and since it finds that the value of “b” is 1, it proceeds to the next statement.
CPU 1 executes the assert(a==1), and, since CPU 1 is working with the old value of “a”, this assertion fails.
CPU 1 receives the “read invalidate” message, and transmits the cache line containing “a” to CPU 0 and invalidates this cache line from its own cache. But it is too late.
CPU 0 receives the cache line containing “a” and applies the buffered store just in time to fall victim to CPU 1’s failed assertion.
Step 1: CPU0
sends "read invalidate" to CPU1
Step 5: CPU1
receives value of b
from CPU0
in response to CPU1
's earlier (step 2) "read" message
Step 8: CPU1
receives the "read invalidate" message from step 1
How can Step 8 happen after 5?
In both 5 and 8, CPU1 is receiving stuff from CPU0. But notice that CPU0 sends "read invalidate" message before ACKing CPU1's "read" message (of b
).
If CPU1 has a income message queue that is processed by order, then CPU1 has to process CPU0's "read invalidate" message earlier than it processes CPU0's response to b
value "read" message. Doesn't it?