How do threaded systems cope with shared data being being cached by different cpus?
Asked Answered
J

4

11

I'm coming largely from a c++ background, but I think this question applies to threading in any language. Here's the scenario:

  1. We have two threads (ThreadA and ThreadB), and a value x in shared memory

  2. Assume that access to x is appropriately controlled by a mutex (or other suitable synchronization control)

  3. If the threads happen to run on different processors, what happens if ThreadA performs a write operation, but its processor places the result in its L2 cache rather than the main memory? Then, if ThreadB tries to read the value, will it not just look in its own L1/L2 cache / main memory and then work with whatever old value was there?

If that's not the case, then how is this issue managed?

If that is the case, then what can be done about it?

Jackscrew answered 9/7, 2009 at 17:52 Comment(0)
A
11

Your example would work just fine.

Multiple processors use a coherency protocol such as MESI to ensure that data remains in sync between the caches. With MESI, each cache line is considered to be either modified, exclusively held, shared between CPU's, or invalid. Writing a cache line that is shared between processors forces it to become invalid in the other CPU's, keeping the caches in sync.

However, this is not quite enough. Different processors have different memory models, and most modern processors support some level of re-ordering memory accesses. In these cases, memory barriers are needed.

For instance if you have Thread A:

DoWork();
workDone = true;

And Thread B:

while (!workDone) {}
DoSomethingWithResults()

With both running on separate processors, there is no guarantee that the writes done within DoWork() will be visible to thread B before the write to workDone and DoSomethingWithResults() would proceed with potentially inconsistent state. Memory barriers guarantee some ordering of the reads and writes - adding a memory barrier after DoWork() in Thread A would force all reads/writes done by DoWork to complete before the write to workDone, so that Thread B would get a consistent view. Mutexes inherently provide a memory barrier, so that reads/writes cannot pass a call to lock and unlock.

In your case, one processor would signal to the others that it dirtied a cache line and force the other processors to reload from memory. Acquiring the mutex to read and write the value guarantees that the change to memory is visible to the other processor in the order expected.

Aubergine answered 9/7, 2009 at 17:57 Comment(1)
Thank you very much for this response. I had wondered if some sort of hardware level mechanism must be coming into play here, because it seemed there were practical limits on what could be accomplished at the language/compiler level.Jackscrew
W
2

Most locking primitives like mutexes imply memory barriers. These force a cache flush and reload to occur.

For example,

ThreadA {
    x = 5;         // probably writes to cache
    unlock mutex;  // forcibly writes local CPU cache to global memory
}
ThreadB {
    lock mutex;    // discards data in local cache
    y = x;         // x must read from global memory
}
Wad answered 9/7, 2009 at 18:1 Comment(6)
I don't believe barriers force a cache flush, they force constraints on the order of memory operations. A cache flush won't help you if the write to X can pass unlocking the mutex.Aubergine
Barriers would be pretty useless if the compiler re-orders memory operations across them, hmm? At least for GCC, I think this is usually implemented with a memory clobber, which tells GCC "invalidate any assumptions about memory".Wad
Oh, I see what you're saying. A cache flush isn't necessary, as long as ordering is properly respected between processors. So I suppose this explanation is a simplified view, and yours delves more into the hardware details.Wad
As far as the programmer is concerned, though, there isn't a practical difference (in terms of correctness) between "this bit of memory must be fetched from this processor's cache" and "the global state of this bit of memory is now set".Wad
Right. I don't think you'd want a cache flush here, for the hopefully common case where the memory actually isn't being shared between processors. On x86, a memory barrier is usually just a lock xchg instruction, which I don't believe has any effect on the cache.Aubergine
FWIW, for a simple spin lock, you can peform LOCK as an atomic-xchg to set an ownership flag followed by a read (aquire) memory barrier. UNLOCK can be a write (release) barrier followed by a non-atomic write to cleare the ownership flag. In no way is a cache-flush required. On X86, where memory barriers are implicit, you need only a single atomic operation.Evered
C
0

In general, the compiler understands shared memory, and takes considerable effort to assure that shared memory is placed in a sharable place. Modern compilers are very complicated in the way that they order operations and memory accesses; they tend to understand the nature of threading and shared memory. That's not to say that they're perfect, but in general, much of the concern is taken care of by the compiler.

Castrate answered 9/7, 2009 at 18:2 Comment(0)
M
0

C# has some build in support for this kind of problems. You can mark an variable with the volatile keyword, which forces it to be synchronized on all cpu's.

public static volatile int loggedUsers;

The other part is a syntactical wrappper around the .NET methods called Threading.Monitor.Enter(x) and Threading.Monitor.Exit(x), where x is an variable to lock. This causes other threads trying to lock x to have to wait untill the locking thread calls Exit(x).

public list users;
// In some function:
System.Threading.Monitor.Enter(users);
try {
   // do something with users
}
finally {
   System.Threading.Monitor.Exit(users);
}
Murrain answered 9/7, 2009 at 18:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.