How is message queue implemented in cache coherence protocol?
Asked Answered
D

1

1

In Paul McKenny's famous paper "Memory Barriers: A Hardware View for Software Hackers"

3.3 Store Buffers and Memory Barriers

To see the second complication, a violation of global memory ordering, consider the following code sequences with variables “a” and “b” initially zero:

1   void foo(void)
2   {
3       a = 1;
4       b = 1;
5   }
6
7   void bar(void)
8   {
9       while (b == 0) continue;
10      assert(a == 1);
11  }

Suppose CPU 0 executes foo() and CPU 1 executes bar(). Suppose further that the cache line containing “a” resides only in CPU 1’s cache, and that the cache line containing “b” is owned by CPU 0. Then the sequence of operations might be as follows:

  1. CPU 0 executes a=1. The cache line is not in CPU 0’s cache, so CPU 0 places the new value of “a” in its store buffer and transmits a “read invalidate” message.

  2. CPU 1 executes while(b==0)continue, but the cache line containing “b” is not in its cache. It therefore transmits a “read” message.

  3. CPU 0 executes b=1. It already owns this cache line (in other words, the cache line is already in either the “modified” or the “exclusive” state), so it stores the new value of “b” in its cache line.

  4. CPU 0 receives the “read” message, and transmits the cache line containing the now-updated value of “b” to CPU 1, also marking the line as “shared” in its own cache.

  5. CPU 1 receives the cache line containing “b” and installs it in its cache.

  6. CPU 1 can now finish executing while(b==0) continue, and since it finds that the value of “b” is 1, it proceeds to the next statement.

  7. CPU 1 executes the assert(a==1), and, since CPU 1 is working with the old value of “a”, this assertion fails.

  8. CPU 1 receives the “read invalidate” message, and transmits the cache line containing “a” to CPU 0 and invalidates this cache line from its own cache. But it is too late.

  9. CPU 0 receives the cache line containing “a” and applies the buffered store just in time to fall victim to CPU 1’s failed assertion.

Step 1: CPU0 sends "read invalidate" to CPU1

Step 5: CPU1 receives value of b from CPU0 in response to CPU1's earlier (step 2) "read" message

Step 8: CPU1 receives the "read invalidate" message from step 1

How can Step 8 happen after 5?

In both 5 and 8, CPU1 is receiving stuff from CPU0. But notice that CPU0 sends "read invalidate" message before ACKing CPU1's "read" message (of b).

If CPU1 has a income message queue that is processed by order, then CPU1 has to process CPU0's "read invalidate" message earlier than it processes CPU0's response to b value "read" message. Doesn't it?

Dutiable answered 6/12, 2022 at 6:41 Comment(1)
Don't post pictures of text, quote it in quote formatting.Huddleston
A
1

I think it's because CPU1's read has started, and it will complete (which involves actually waiting for and receiving the value requested) before going on to process any incoming message. I.e. CPU1's read is a blocking operation that needs to complete before any other cache management, such as processing CPU0's read-invalidate message, is carried out.

I'm not an expert in this, but that's an explanation that fits the steps!

Athene answered 6/12, 2022 at 22:18 Comment(2)
Thanks @bazza! This makes a lot of sense. That read is blocking & synchronous can certainly explain the behavior. That being said, won't modern CPU proceed to speculatively execute next instruction(s) rather than stalling on the read which could take hundreds of cycles?Dutiable
@Weipeng, about the only thing going for my answer is that it makes sense of the steps; it does not mean that it's actually correct! Yes, this is the kind of reason why speculative execution is designed in, making it difficult for designers to be sure they've not made their CPU vulnerable in some way (Meltdown, Spectre). This is why some people (inc myself) think that the time of SMP / cache-coherent CPUs should pass, and return to NUMA architectures like Transputer or the Cell processor; push this kind of thing up to the software stack.Athene

© 2022 - 2025 — McMap. All rights reserved.