Taking a semaphore must be atomic. Is Pintos's sema_down safe?
Asked Answered
L

1

2

This piece of code comes from Pintos source: https://www.cs.usfca.edu/~benson/cs326/pintos/pintos/src/threads/synch.c

void
sema_down (struct semaphore *sema) 
{
  enum intr_level old_level;

  ASSERT (sema != NULL);
  ASSERT (!intr_context ());

  old_level = intr_disable ();
  while (sema->value == 0) 
    {
      list_push_back (&sema->waiters, &thread_current ()->elem);
      thread_block ();
    }
  sema->value--;
  intr_set_level (old_level);
}

The fact of taking a semaphore is sema->value--;. If it works it must be an atomic one operation. How can we know that it is atomic operation in fact? I know that modern CPU guarantees that aligned memory operation ( for word/doubleword/quadword- it depends on) are atomic. But, here, I am not convinced why it is atomic.

Laryngo answered 6/9, 2016 at 22:33 Comment(2)
Note that the code here is designed to be run within a kernel, on a single core/cpu system, and the piece of code between enabling/disabling interrupts is not going to be pre-empted. This code is useful within the pintos kernel, it's not generally useful.Irmgard
Life was much simpler back in 1992, no multi-core cpus yet. Well, not affordable ones anyway.Telega
E
7

TL:DR: Anything is atomic if you do it with interrupts disabled on a UP system, as long as you don't count system devices observing memory with DMA.

Note the intr_disable (); / intr_set_level (old_level); around the operation.


modern CPU guarantees that aligned memory operation are atomic

For multi-threaded observers, that only applies to separate loads or stores, not read-modify-write operations.


For something to be atomic, we have to consider what potential observers we care about. What matters is that nothing can observe the operation as having partially happened. The most straightforward way to achieve that is for the operation to be physically / electrically instantaneous, and affect all the bits simultaneously (e.g. a load or store on a parallel bus goes from not-started to completed at the boundary of a clock cycle, so it's atomic "for free" up to the width of the parallel bus). That's not possible for a read-modify-write, where the best we can do is stop observers from looking between the load and the store.

My answer on Atomicity on x86 explained the same thing a different way, about what it means to be atomic.


In a uniprocessor (UP) system, the only asynchronous observers are other system devices (e.g. DMA) and interrupt handlers. If we can exclude non-CPU observers from writing to our semaphore, then it's just atomicity with respect to interrupts that we care about.

This code takes the easy way out and disables interrupts. That's not necessary (or at least it wouldn't be if we were writing in asm).

An interrupt is handled between two instructions, never in the middle of an instruction. The architectural state of the machine either includes the memory-decrement or it doesn't, because dec [mem] either ran or it didn't. We don't actually need lock dec [mem] for this.

BTW, this is the use-case for cmpxchg without a lock prefix. I always used to wonder why they didn't just make lock implicit in cmpxchg, and the reason is that UP systems often don't need lock prefixes.

The exceptions to this rule are interruptible instructions that can record partial progress, like rep movsb or vpgather / vpscatter See Interrupting instruction in the middle of execution These won't be atomic wrt. interrupts even when the only observer is other code on the same core. Only a single iteration of rep whatever, or a single element of a gather or scatter, will have happened or not.

Most SIMD instructions can't record partial progress, so for example vmovdqu ymm0, [rdi] either fully happens or not at all from the PoV of the core it runs on. (But not of course guaranteed atomic wrt. other observers in the system, like DMA or MMIO, or other cores. That's when the normal load/store atomicity guarantees matter.)


There's no reliable way to make sure the compiler emits dec [value] instead of something like this:

mov   eax, [value]
                           ;; interrupt here = bad
dec   eax
                           ;; interrupt here = bad
mov   [value], eax

ISO C11 / C++11 doesn't provide a way to request atomicity with respect to signal handlers / interrupts, but not other threads. They do provide atomic_signal_fence as a compiler barrier (vs. thread_fence as a barrier wrt. other threads/cores), but barriers can't create atomicity, only control ordering wrt. other operations.

C11/C++11 volatile sig_atomic_t does have this idea in mind, but it only provides atomicity for separate loads/stores, not RMW. It's a typedef for int on x86 Linux. See that question for some quotes from the standard.

On specific implementations, gcc -Wa,-momit-lock-prefix=yes will omit all lock prefixes. (GAS 2.28 docs) This is safe for single-threaded code, or a uniprocessor machine, if your code doesn't include device-driver hardware access that needs to do an atomic RMW on a MMIO location, or that uses a dummy lock add as a faster mfence.

But this is unusable in a multi-threaded program that needs to run on SMP machines, if you have some atomic RMWs between threads as well as some between a thread and a signal handler.

Energy answered 6/9, 2016 at 22:50 Comment(19)
What do you mean by "For multi-threaded observers, that only applies to separate loads or stores, not read-modify-write operations." exactly? After all, in https://mcmap.net/q/14631/-atomicity-of-loads-and-stores-on-x86 you said that RMW are atomic. " Probably, the difference is lock prefix.Laryngo
"In a uniprocessor (UP) system, the only asynchronous observers are other system devices (e.g. DMA) and interrupt handlers. If we can exclude non-CPU observers from writing to our semaphore, then it's just atomicity with respect to interrupts that we care about." Does it mean that implementaion assumes that there is no DMA devices? If they were it would be a not-always-working implementaion?Laryngo
"This code takes the easy way out and disables interrupts. That's not necessary (or at least it wouldn't be if we were writing in asm)." Why it is not necessary? Especially, why writing in asm does matter? After all, C is compiled to asm.Laryngo
"I don't think C11 / C++11 provide a way to request atomicity with respect to signal handlers / interrupts, but not other threads. They do provide atomic_signal_fence as a compiler barrier, but I don't recall a type that on x86 would avoid lock prefixes while still having the other semantics of atomic types." I don't grasp it. After all, you've said that interrupt can happen only between two instruction ( macro-instruction). Perhaps, I don't see something else.Laryngo
@Gilgamesz: Yes, of course you need lock if you want a RMW to be atomic with respect to other threads on an SMP machine. That (and atomic MMIO bus cycles) is why it exists. The part about DMA observers means: don't program your network card to look at your semaphore. Of course there are DMA devices in the system, they're just not looking at the RAM you're using for the semaphore. It's only synchronizing code running on the CPU, not synchronizing kernel code with a device that is also modifying a shared data structure. Then you may need lock dec even on UP.Energy
re: last two comments: Think about what happens if an interrupt-handler runs at one of the "interrupt here = bad" places. The interrupt handler decrements the old value, since this code hasn't stored the decremented value yet. Of course C compiles to asm, but it matters what asm it compiles to.Energy
Thanks a lot for your patience :).Laryngo
"but it matters what asm it compiles to". Of course. Before I didn't grasp what you mean. Now, I get it. Everthing is clear besides one fact: "I don't think C11 / C++11 provide a way to request atomicity(...)". Of course, by atomicity here you understand a piece of code exectued without "mixed" interrupt/signal handler. It seems to be impossible to achive something like that. For example Linux send to our process a signal and it will be handled by signal handler registed by us ( for example). But, the OS decided to pass control to a signal handler. Yeah?Laryngo
@Gilgamesz: dec [mem] either runs before the signal handler or it doesn't. It's the same situation as for interrupts on a UP kernel: asynchronous effects only happen at points of consistent architectural state.Energy
Ok, so if handler doesn't wirting to semaphore ( or something else) there is no problem. Thanks! :)Laryngo
@Gilgamesz: that's not the point. dec [mem] is atomic with respect to any other code in an interrupt handler. If no signal/interrupt handler could touch the semaphore, there would be no need for it in the first place! (Unless you needed to sleep while holding the lock or something, then synchronizing between software threads makes sense).Energy
But that's not what happens in PintOS: Instead it makes the check-and-modify atomic by doing it (and the modification of the wait-list) with interrupts disabled, so no handler can run.Energy
If you were writing it in asm, you could use dec [mem] jge we_got_the_lock else fall through to inc [mem] (undo our failed attempt to acquire) / call a function to wake whichever thread is holding the lock.Energy
Yes, I understand it. Perhaps I didn't express what I had meant clearly before. Thanks :)Laryngo
@Gilgamesz: FYI, I interpreted your previous comment as saying that you thought dec [mem] was only safe if you used it on semaphores that won't be written to by any interrupt/signal handlers or anything else. i.e. Otherwise there would be a problem. I hope that helps your English-learning efforts; I think most other native English speakers would read that comment the same way.Energy
@PeterCordes do you know if the single core atomicity holds for SIMD instructions? (i.e something like a vpmovdqa is atomic with respective to interrupt handlers / preemption? Page 264 of Intel Architecture Manual seems to suggest that there is not guranteed to be the case. (I am especially weary that with a ymm register it might write tear between the lanes.Bullring
@Noah: Interrupts can only arrive between instructions, so yes, vmovdqa either fully happens or not at all from the PoV of the core it runs on. Only a few instructions are "interruptible", i.e. have any mechanism to partially complete and record progress: e.g. rep movsb, vpgatherdd. The vmovdqa 16 or 32-byte store is not of course guaranteed to be atomic wrt. other observers in the system, like DMA or MMIO, or other cores. That's when the normal load/store atomicity guarantees (up to 8 bytes) matter.Energy
@PeterCordes I see thanks. Looking at your answer relating to vpgatherdd wouldnt something like vpmovdqu also face the same issue in that it could page fault partially completed? Or since its going through cache is that not an issue?Bullring
@Noah: After handling a page fault, it resumes executing from scratch. It has nowhere to store partial progress across a fault, even if it was split across a page boundary so there could be some valid data from one page while still faulting on the other page. Cache has nothing to do with anything; cache is physical not virtual, and unrelated to page faults.Energy

© 2022 - 2024 — McMap. All rights reserved.