Should interlocked implementations based on CompareExchange use SpinWait?
Asked Answered
R

2

10

Below is an implementation of an interlocked method based on Interlocked.CompareExchange.

Is it advisable for this code to use a SpinWait spin before reiterating?

public static bool AddIfLessThan(ref int location, int value, int comparison)
{
    int currentValue;
    do
    {
        currentValue = location; // Read the current value
        if (currentValue >= comparison) return false; // If "less than comparison" is NOT satisfied, return false
    }
    // Set to currentValue+value, iff still on currentValue; reiterate if not assigned
    while (Interlocked.CompareExchange(ref location, currentValue + value, currentValue) != currentValue);
    return true; // Assigned, so return true
}

I have seen SpinWait used in this scenario, but my theory is that it should be unnecessary. After all, the loop only contains a handful of instructions, and there is always one thread making progress.

Say that two threads are racing to perform this method, and the first thread succeeds right away, whereas the second thread initially makes no change and has to reiterate. With no other contenders, is it at all possible for the second thread to fail on its second attempt?

If the example's second thread cannot fail on the second attempt, then what might we gain with a SpinWait? Shaving off a few cycles in the unlikely event that a hundred threads are racing to perform the method?

Rackrent answered 4/11, 2019 at 15:6 Comment(16)
When there is only 1 Thread, you don't need any code handling synchronisation. But when you have multiple Threads you do need it. No matter how small the code is you are executing. Even a simple increment (3 Assembly instructions) can get race conditioned and return a false valueRadferd
@Radferd OP isn't asking whether it's necessary to use Interlocked; rather, should he use SpinWait instead of just the empty loop body?Johst
You should probably just use SpinOnce to prevent a single-threaded OS from potentially being starved. See #37799881Johst
@MatthewWatson I know, hence I left a comment, not an answer. It was a comment to the sentence "After all, the loop only contains a handful of instructions, and there is always one thread making progress" pointing out that no matter how few instructions, there is always a chance for a race conditionRadferd
@Radferd The race condition is already handled correctly by virtue of using Interlocked. For clarification, I am only interested in whether or not a SpinWait is meaningful, for example to meaningfully save CPU cycles or (thanks @MatthewWatson!) prevent a single-threaded OS from being starved.Rackrent
Then I misunderstood you. Saying "[...] but my theory is that it should be unnecessary. After all, the loop only contains a handful of instructions [...]" makes it seem like you think it's unnecessary to use Interlock because there is only such a small set of instructions, or it does at least to meRadferd
Spinwait bascially calls the pause CPU instruction which shuts down parts of the current core. This saves some electricity although Task Manager would show 100% CPU utlization. Skylake has changed the latency of this instruction from 14ns to 140ns which allows for longer pause times. Apart from power consumption you should see no difference as long as you are not mixing threads with different scheduler priorities.Allare
@AloisKraus I understand that in theory, but my reasoning is that this loop succeeds on the next attempt, regardless of whether or not we wait. That's why I expect that SpinWait will not even reduce power consumption, as the same number of attempts will be performed anyway! (With two threads, that is one attempt for the first and two for the second thread.)Rackrent
@Timo, I think you may benefit from a spin lock on platforms where Interlocked is not supported by hardware but has to be emulated.Stockish
You tagged spinwait "A spin-wait loop is a technique used in multithreaded applications whereby one thread waits for other threads for protecting a critical section, for barriers, or for other synchronizations." That doesn't sound at all like what you are asking.Gareth
@MatthewWatson You mean with green threads? 1) How could there be a problem w/ RMW asm instr w/ green threads? 2) How would spinWait.SpinOnce() solve it?Gareth
@Stockish Which modern arch (where you find impl for modern PL) doesn't have RMW asm instr?Gareth
@Gareth I believe that one thread waiting for other threads to access a critical section is exactly what I'm asking about: All threads want to assign ref location. Interlocked.CompareExchange lets only one thread do that at a time. "Losing" threads loop vigorously, and SpinWait can make that less vigorously. The question is whether that adds practical value in this scenario, as the operation is so simple and fast (i.e. threads succeed in such rapid succession that spinning may be overkill).Rackrent
@Rackrent What critical section? Why not loop vigorously?Gareth
@Gareth Critical section ref location (the contended place in memory), which must only be written to by one thread at a time. As the question indicates, my theory is that looping vigorously is fine, and I'm trying to make sure that theory is correct.Rackrent
@Rackrent A location is not a "critical section".Gareth
G
3

My non-expert opinion is that in this particular case, where two threads occasionally call AddIfLessThan, a SpinWait is unneeded. It could be beneficial in case the two threads were both calling AddIfLessThan in a tight loop, so that each thread could make progress uninterrupted for some μsec.

Actually I made an experiment and measured the performance of one thread calling AddIfLessThan in a tight loop versus two threads. The two threads need almost four times more to make the same number of loops (cumulatively). Adding a SpinWait to the mix makes the two threads only slightly slower than the single thread.

Guanaco answered 4/11, 2019 at 17:27 Comment(5)
What effect does SpinWait have with one thread, eg is there a reason not to use it?Rayleigh
@IanRingrose it makes it 15% slower. I declare a variable of type SpinWait inside the AddIfLessThan method, and although it is a value type and its SpinOnce method is never called, it still adds some overhead.Guanaco
@TheodorZoulias: You could declare in your IL body to skip init of locals which should give your nearly the perf back. You need to emit the IL code for this because there is no language support for it yet. See github.com/dotnet/csharplang/blob/master/proposals/… When you want to spin once you can call Reset on it and then SpinOnce.Allare
Thanks @TheodorZoulias! FYI, the two threads was just an example to illustrate my point. This is a class library and can be used as the consumer sees fit. The best we can do is make assumptions about the likelihood of race conditions with many threads. Given how little time the operation takes, I deem it unlikely that many threads will perform this operation overlapping with each other.Rackrent
If you are updating a tight loop, you should probably consider making the value private and update once at the end of loop. If the shared value needs to kept more or less actual and the loop is extremely long, consider updating the value like you advance a progress bar, once every X ms.Gareth
E
2

Two threads is just not a subject for SpinWait discussion. But this code doesn't tell us how many threads can actually compete for the resource and with relatively high number of threads using of the SpinWait can become beneficial. In particular with higher number of threads the virtual queue of threads, which are trying to successfully acquire the resource, gets longer and those threads which happen to be served in the end have good chances to exceed their time slice allocated by scheduler which in turn can lead to higher CPU consumption and may affect execution of other scheduled threads even with higher priority. The SpinWait has good answer to this situation by setting some upper limit of allowed spins after which the context switching is going to be executed. So it's a reasonable trade-off between necessity to make an expensive system call in order to trigger a context switching and uncontrolled user mode CPU consumption which is under the risk to impact the other threads execution in certain situations.

Elaterite answered 4/11, 2019 at 20:50 Comment(13)
The two threads was just an example to illustrate my point. This is a class library. We should be able to deal with any realistic scenario, taking its likelihood into account. Given how little time the operation takes, I deem it unlikely that many threads will perform this operation overlapping with each other.Rackrent
You make a fair point. I agree that, with a lot of threads, some threads would fail many before succeeding. However, part of me thinks that the operation is so tiny that, say, performing it 100 times is still peanuts. With this in mind, would you expect it to make the SpinWait overkill, or not?Rackrent
@Timo, the only answer is measurement, especially considering that this is going to be a library code which should be able to deal with various load and hardware scenarios I presume. For better fairness and thus for better throughput using the SpinWait is better strategy (with relatively small cost of dozens of microseconds) than not using that unless you exclude in advance a possibility of high low load scenarios, so finding a load point at which the throughput can be affected would give you a hint whether you need it or not (note that it also may depend on number of cores).Elaterite
With that many threads w/ need of a common resource for such long time so frequently that one is going to consume its time slice, I would either consider making private copies of the resource (to be synced later) or even abandon threads.Gareth
@Gareth As stated 4 comments up, this is a class library. Of course the consuming code could use a case-specific approach. The point of the class library, however, is to be resilient to various use cases. That's why the question remains in which situations the SpinWait adds value. So far, those seem to be (A) the single-core case, and (B) possibly the very-high-contention case.Rackrent
@Rackrent I still have no idea what is spinning and what is high contention case.Gareth
@Gareth the private copy approach has very limited applicability (in particular, it's reasonable only when expected number of writes is much higher than expected number of reads, which is difficult to predict for a library usage) with high complexity of the read scenario. As for the abandoning threads, in cases when the computational scalability is needed (which is supposed to be true in this specific case) this is rather harmful suggestion as vast majority of modern devices use more than one core.Elaterite
@DmytroMukalov The "vast majority of modern devices" runs more than one program and not shipping a MT program doesn't mean that other cores won't be used. Suggesting that because there are many cores all programs should try to use as much as possible is rather harmful suggestion, as many programs could end up using much more CPU time and slowing down computers.Gareth
@curiousguy, you seem to talk about overall device utilization which has nothing to do with a specific program scalability. The computers are for computations and not using the all cores when I you potentially could do that means limiting the scalability of your program (please note, I stated it before when scalability is required). As for slowing down this seems to be a speculation there because it's mainly related to misusing of concurrency (but not with multi-threading itself) which can lead to oversubscription, unfairness, high contention and other negative things which impact performance.Elaterite
@curiousguy, if you are using multi-threading improperly it most likely will tend to affect the performance but this is rather subject of the implementation than question of whether to do that or not because using same logic the best suggestion would be not to develop any program at all as you can develop it improperly and it may slow down a computer or even worse get it down.Elaterite
@DmytroMukalov 1) High contention is what we were debating in the 1st place. 2) A good OS will limit the "damages" possibly done by a program. I'm just saying that not all programs need MT even if they are CPU intensive and running a MT program on more cores can turn the perf from good enough to catastrophic in a difficult to predict way.Gareth
@curiousguy, if you read my comments carefully you should notice that I repeated word scalability couple of times. Of course not all programs even those which imply heavy computations need multi-threading (even more the multi threading isn't always possible to apply) but those which need computational (not I/O) scalability require multi-threading.Elaterite
@curiousguy, As for the items: 1) yes, contention is somewhat related to subject and there receipts how to address such problems without falling back to a single threading (i.e without affecting required quality attributes) and 2) this is very broad statement regarding the OS - there are plenty of ways how a program can be harmful with no chance that a damage can be prevented by OS and as for the second part you seem to confuse reason of the potential problems - not MT by itself leads to the problems but misusing of it can be source of the problems.Elaterite

© 2022 - 2024 — McMap. All rights reserved.