What happens if two process in different processors try to acquire the lock at EXACTLY same time
Asked Answered
G

5

18

Ok, so I am reading about synchronization, and I read through various algorithms such as spinlocks, semaphores, and mutex to avoid race condition.

However, these algorithms can't prevent race condition in SMP when multiple proceses access the data exactly at the same time.

For example, suppose thread 1 in processor A runs lock(mutex1); withdraw(1000); unlock(mutex1);

and thread 2 in processor B runs lock(mutex1); deposit(1000); deposit(1000); unlock(mutex1);

When both threads run EXACTLY AT THE SAME TIME, both threads will be in critical section simultaneously.

The only solution (should be in hardware level) would be making each processors run slightly off to each other, but it defeats the purpose of parallelism.

Is there any hardware level support to avoid these situation where multiple processors try to acquire the lock at the exactly same time?

(this is not a problem of atomicity, but rather problem of exact parallelism, and I wonder how SMP deals with it).

Goidelic answered 23/10, 2011 at 0:54 Comment(0)
H
21

The whole point of a mutex is that even if two cores try to acquire it at the same time, one of them will be blocked until the other one releases it. A mutex that permitted two cores to hold that mutex at the same time would be completely broken and useless for its sole, intended purpose.

Somewhere in hardware there is a bus arbitrator that permits only one core to control the bus that links those two cores. If either core already has the memory holding the mutex in a private cache, that core will win. Otherwise, whichever one gets the bus first will win.

The bus arbitrator may work in many ways, but typically it will rotate. So if the cores are 0, 1, 2, and 3 and core 2 had the bus last, the bus will next go to core 3 if it wants it, otherwise core 0 if it wants it, otherwise core 1 if it wants it, otherwise back to core 2. Depending on exactly which bus is involved (whether it's a fight between the two core's L2 caches or over the memory itself or whatever) some of the cores may contend as a unit against other core groups and then sub-contend for which specific core gets it first.

It may be that one core already has control over the bus and so it will win outright. Typically, the arbitrator allows a core to continuously hold the bus so long as it continues to want to use it for a few transactions to avoid wasteful handoffs that don't allow the core to make forward progress.

The exact details can vary depending on a large number of factors including how the cores are arranged, which cores have the lock in their caches in what states, who had the bus last and whether the bus arbitrator uses timeslices, round-robin, or some other mechanism, and so on. But any implementation that didn't guarantee that one and only one core winds up getting the lock would be considered horribly broken.

Hearse answered 23/10, 2011 at 1:19 Comment(7)
In the SMP case there is usually multi channel memory requests which can still lead to the issue mentioned above so both cores will see the mutex as unlocked and both cores will do mem requests to set the locked to true, which the memory will oblige. Also due to caching of memory values on different cores main memory may not even come into play at all so saying that another piece of hardware handles it isn't right.Place
@JesusRamos: Both cores will not see the mutex as unlocked because they don't do a read operation, they do an exchange operation. Before that operation is possible, the core must own the cache line containing the lock. Only one core can own the cache line at a time. And I never said main memory comes into play. The 'bus' involved may or may not be the memory bus. It may be an inter-core cache coherency bus.Hearse
What if two processes both have shared data in cache (like in SMP where each processor has its own chache)? I guess "locking bus" should be locking the cache bus as well?Goidelic
@SHH: Two cores can have the same shared data validly in each core's cache, but in that case the cache line is specifically marked 'shared' and the core must transition it to 'exclusive' before it is permitted to modify it. This will change the other core's cache line to 'invalid'. This is implemented core-to-core and is independent of main memory. (Read up on the MESI protocol.) There is no need to lock buses anymore. Locking the cache line for the duration of the exchange is sufficient.Hearse
The value can be cached to a core local cache in which case the value may become "stale" if not invalidated, which means there's no need to arbitrate on that cache because only one core can access it unless something is generated to invalidate the cache line. But I may have just assumed main memory if I didn't read your answer clearly.Place
I think I get it now. So, all the problems boil down to multiple cores accessing and modifying the SAME location of memory (or cache) concurrently. And CPU architecture has some way of arbitrating it, (with the help of LOCK prefix and cache coherence mechanism), am I correct?Goidelic
@SHH: Exactly. To be a bit more precise, it's the same physical address in the system's shared memory space. The S in SMP means that all CPUs see the same physical memory space.Hearse
P
3

You might want to look into memory barriers.

http://en.wikipedia.org/wiki/Memory_barrier

In this case the locks would use memory barriers so that the internal value used in the lock cannot be accessed by multiple processors at once.

Some architectures also allow locking of all cores except 1 to allow for this. For example x86 sports a LOCK prefix that when added to instructions will lock access to memory during that instruction. (e.g: LOCK ADD EAX, 1 for an atomic increment to a register)

Architectures that don't support LOCK or atomic instructions use compare and exchange or test and set/swap. Usually it involves a small instruction loop that in high level may look like

while (target != value) target = value;

This may not look like it will execute more than once but it ensures that in between the instructions the value is not change from underneath it. The downside to this approach is if there is high contention on target then it may eat up more clock cycles than you would like but it tends to happen fast enough so it's never really noticeable.

Place answered 23/10, 2011 at 0:58 Comment(10)
Does it mean all the locking algorithms in SMP are implemented using memory barriers (apart from atomic instructions)?Goidelic
From what I've worked on in the kernel and seen in compilers it looks like it. Usually when changing these values you have to invalidate cache's on the other cores (which involves a memory barrier).Place
This is a less terribly misleading explanation than mine was. @SHH: Basically, yes, any multi-processor architecture will have some hardware-supported way of doing some operations on a memory address atomically while preventing race-condition-related problems for that memory address from occuring in the other cores. The OS implementation then uses these low-level features to build higher-level synchronization mechanisms.Quevedo
@Inerdia I saw nothing wrong with your explanation, I was actually going to upvote it :) You didn't go into as much detail but that doesn't mean there was anything wrong with it.Place
@Jesus You beat me to adding the LOCK bits so might as well not split up the comment discussion.Quevedo
Also, while CAS and TAS are not particularly great at implementing general-purpose locks because of the potential long busy-wait, they're useful for writing concurrent data structures. If you expect the busy-wait to be very short (e.g. when a few threads are trying to add an element to a queue), it's better than the overhead of using a full OS-level mutex.Quevedo
@Inerdia Yeah that's where spin locks come in. Usually only use them in conditions where sleeping cannot occur or because you expect the wait time to be minimal in which case using an atomic instruction may actually take longer or an OS call.Place
Memory barriers prevent the re-ordering of memory operations. They don't provide exclusivity. The LOCK prefix doesn't lock an instruction, it locks a location in memory. And atomic instructions haven't locked a bus in more than a decade. That was first changed on the Pentium Pro, and no later CPU actually locks the bus.Hearse
@David Schwartz I never said that they locked the bus. In SMP you require memory barriers still to provide correctness that would allow you to correctly acquire a lock in some situations. I still don't see the point of the downvote.Place
@JesusRamos: The downvote was because of two major technical inaccuracies. First, memory barriers do not prevent concurrent accesses. They simply prevent reordering of memory operations relative to each other. Second, LOCK doesn't stop another core's execution, it simply locks a cache line. It locks memory, not execution.Hearse
P
1

I strongly recommend Curt Schimmel's UNIX® Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers. Different hardware architectures provide different low-level tools for synchronizing access to data, including some architectures with very nearly no help. Schimmel's book provides algorithms that can work even on those architectures.

I wish I could easily locate my copy to summarize the contents.

Pippa answered 23/10, 2011 at 1:10 Comment(0)
T
0

You can prevent this using atomic instructions like TLS and XCHG.

How do you ensure atomicity for an instruction?

You can disable all interruptions before executing the instruction, then enable them all after the instruction is done. That doesn't help on multicore systems, because disabling the interruption in processor 1 doesn't have any effect on processor 2. On multicore systems the atomicity of an instruction is ensured by preventing other CPU's from access to the memory bus (memory barrier).

So, if you implement semaphores using these instructions you'll have no problems on SMP.

Implementation of mutex_lock and mutex_unlock using TSL:

 mutex_lock:
    TSL REGISTER, MUTEX ; copy mutex to register and sets mutex
    CMP REGISTER, #0    ; compare mutex to zero
    JZE ok              ; if zero return
    CALL thread_yield   ; else: mutex is busy, schedule another thread
    JMP mutex_lock      ; try again later
 ok: RET

 mutex_unlock:
    MOVE MUTEX,#0       ; free mutex
    RET

You can find some information about TSL here: http://en.wikipedia.org/wiki/Test-and-set

A good book that can help you with: http://www.amazon.com/Modern-Operating-Systems-Andrew-Tanenbaum/dp/0136006639/ref=sr_1_sc_1?ie=UTF8&qid=1319333818&sr=8-1-spell

Titre answered 23/10, 2011 at 1:27 Comment(1)
Just remember that this method only works if your 'TSL' operation is an atomic test and set.Hearse
C
-1

This is a classical deadlock problem. I'm not sure about hardware support, (but I'm almost sure this is supported at hardware level) however, I can give you an example about the solution for the deadlock problem in databases. If you know all the dependencies you know which dependency should be "killed", this way the commands of the given node will fail, but the system will defeat deadlock and the other nodes won't fail. I think the same approach should be at hardware level.

Colier answered 23/10, 2011 at 1:1 Comment(1)
This code will not deadlock, One of them will acquire the lock, progress and then release the lock. There is no hold and wait condition or not unlocking.Place

© 2022 - 2024 — McMap. All rights reserved.