Interlocked.CompareExchange<Int> using GreaterThan or LessThan instead of equality

Asked 24/10, 2012 at 2:10 Answered 9/2, 2017 at 13:27

Solved c#c++il interlocked interlocked-increment

The System.Threading.Interlocked object allows for Addition (subtraction) and comparison as an atomic operation. It seems that a CompareExchange that just doesn't do equality but also GreaterThan/LessThan as an atomic comparison would be quite valuable.

Would a hypothetical Interlocked.GreaterThan a feature of the IL or is it a CPU-level feature? Both?

Lacking any other option, is it possible to create such a feature in C++ or direct IL code and expose that functionality to C#?

Melodiemelodion answered 24/10, 2012 at 2:10 Comment(2)

Do you really want to read a value, modify it and then write it back even if some other thread also modified it in the meantime? What would be the use case of that? – Lichter 24/10, 2012 at 16:56

@Lichter if <int> is like a DateTime stamp, I'm doing logic that should only proceed for "new" data. Also for Max() Min() functions – Melodiemelodion 24/10, 2012 at 17:0

Update to the later post I made here: we found a better way to make the greater comparison by using additional lock object. We wrote many unit tests in order to validate that a lock and Interlocked can be used together, but only for some cases.

How the code works: Interlocked uses memory barriers that a read or write is atomic. The sync-lock is needed to make the greater-than comparison an atomic operation. So the rule now is that inside this class no other operation writes the value without this sync lock.

What we get with this class is an interlocked value which can be read very fast, but write takes a little bit more. Read is about 2-4 times faster in our application.

Here the code as view:

See here: http://files.thekieners.com/blogcontent/2012/ExchangeIfGreaterThan2.png

Here as code to copy&paste:

public sealed class InterlockedValue
{
    private long _myValue;
    private readonly object _syncObj = new object();

    public long ReadValue()
    {
        // reading of value (99.9% case in app) will not use lock-object, 
        // since this is too much overhead in our highly multithreaded app.
        return Interlocked.Read(ref _myValue);
    }

    public bool SetValueIfGreaterThan(long value)
    {
        // sync Exchange access to _myValue, since a secure greater-than comparisons is needed
        lock (_syncObj)
        {
            // greather than condition
            if (value > Interlocked.Read(ref  _myValue))
            {
                // now we can set value savely to _myValue.
                Interlocked.Exchange(ref _myValue, value);
                return true;
            }
            return false;
        }
    }
}

Purify answered 15/11, 2012 at 6:46 Comment(4)

Thanks for the update. I wonder how much faster your approaches would be if you replaced lock with a spinlock. Spinlock is the foundation for many of the new ConcurrentDictionary and other objects in .NET 4.x – Melodiemelodion 15/11, 2012 at 13:36

I don't think you really need to use Interlocked inside the lock block, as you've already locked the entire class instance by locking around _syncObj – Giliane 23/1, 2014 at 16:27

On a 32-bit environment a long read is not guaranteed to be atomic. So says this blog – Pennyweight 11/4, 2014 at 0:50

@veljkoz: Interlocked.Read is not necessary inside the lock, but Interlocked.Exchange is important because another thread might call the ReadValue() method simultaneously (and long is not atomic when running on x86). – Weyermann 12/8, 2014 at 16:27

You can build other atomic operations out of InterlockedCompareExchange.

public static bool InterlockedExchangeIfGreaterThan(ref int location, int comparison, int newValue)
{
    int initialValue;
    do
    {
        initialValue = location;
        if (initialValue >= comparison) return false;
    }
    while (System.Threading.Interlocked.CompareExchange(ref location, newValue, initialValue) != initialValue);
    return true;
}

Miltonmilty answered 24/10, 2012 at 20:9 Comment(16)

This is exactly what I was looking for... recomposing what's available into new constructs – Melodiemelodion 24/10, 2012 at 23:36

I'm not sure it it's just me, but surely this code does not provide an atomic operation? Interlocked API provides functionality that is atomic, atomic operations, but you've sidestepped that functionality in this code by stringing several atomic statements with a while loop etc, which are essentially, no longer atomic in nature. – Stupid 11/2, 2014 at 12:47

@Nnoel None of the steps have external visibility until the CompareExchange, which is atomic. If you look at most processors, the only atomic core operation is CompareExchange; everything else is built out of it. – Miltonmilty 11/2, 2014 at 14:34

@RaymondChen I've read the linked blog article, but I don't understand how it's permissible to return false inside the loop. I thought you would have to store the result of the comparison, then return it outside the loop? – Clavichord 21/7, 2014 at 14:37

@RB If the function returns false, then no operation occurred. – Miltonmilty 21/7, 2014 at 18:10

Just a note: if using long instead of int, it would be good to change initialValue = location; into initialValue = Interlocked.Read(ref location);. – Weyermann 2/10, 2014 at 14:1

The first read of initialValue is not thread safe. If initialValue has a cached value that is > comparison, when the actual current value is < comparison, this code fails by returning immediately without reading the initialValue in a thread safe way via Interlocked. – Devora 12/4, 2023 at 9:2

@Devora But that's the same race condition if the InterlockedExchangeIfGreaterThan function simply executed before the other thread modified the value at all. In other words, any race condition here would already have been a race condition. (I'm assuming relaxed ordering. I can't find documentation that specifies how strong a barrier these functions guarantee.) – Miltonmilty 12/4, 2023 at 19:44

The race condition is already there indeed, but I assume the purpose of the method is to remove it. If the thread running your function is using an old value of location, it might not exchange it, even if comparison is greater than the most recent value (which such thread would be unaware of). – Devora 13/4, 2023 at 9:24

The race is unremovable (assuming relaxed order). If the calling thread is running a tiny bit faster, it will beat the mutating thread, and doing nothing is correct. If the calling thread expects the value to be less due to some other variable changing, then that assumes a stronger order (like sequential consistency), and if that's what you want, you can add a barrier. – Miltonmilty 13/4, 2023 at 13:0

This extension method is not atomic. The value for the location property can change between the do block and the call to CompareExchange, if two or more threads are calling the method at the same time. – Norvall 16/3 at 20:24

@Devora You are correct, this method is not atomic. – Norvall 16/3 at 20:24

@Norvall If it changes, then the CompareExchange will fail and we loop back and try again. – Miltonmilty 16/3 at 20:26

@RaymondChen A while loop is not atomic...you're just reinventing a lock statement by blocking the thread. Interlocked methods exist to eliminate the need for a lock – Norvall 16/3 at 20:29

@RaymondChen Plus, with this method you lose the serialized nature of Interlocked methods: if thread A calls this method before thread B, there is a chance thread A may return from this method AFTER thread B. This is never something you need to worry about with truly atomic operations – Norvall 16/3 at 20:36

@Norvall the point of the exercise is to avoid serialization. It is a feature that the operations can complete out of order. This avoids priority inversions, for example. (Also, guess what: locks are built out of interlocked operations.) – Miltonmilty 16/3 at 23:57

With these helper methods you can not only exchange value but also detect was it replaced or not.

Usage looks like this:

int currentMin = 10; // can be changed from other thread at any moment

int potentialNewMin = 8;
if (InterlockedExtension.AssignIfNewValueSmaller(ref currentMin, potentialNewMin))
{
    Console.WriteLine("New minimum: " + potentialNewMin);
}

And here are methods:

public static class InterlockedExtension
{
    public static bool AssignIfNewValueSmaller(ref int target, int newValue)
    {
        int snapshot;
        bool stillLess;
        do
        {
            snapshot = target;
            stillLess = newValue < snapshot;
        } while (stillLess && Interlocked.CompareExchange(ref target, newValue, snapshot) != snapshot);

        return stillLess;
    }

    public static bool AssignIfNewValueBigger(ref int target, int newValue)
    {
        int snapshot;
        bool stillMore;
        do
        {
            snapshot = target;
            stillMore = newValue > snapshot;
        } while (stillMore && Interlocked.CompareExchange(ref target, newValue, snapshot) != snapshot);

        return stillMore;
    }
}

Poisoning answered 9/2, 2017 at 13:27 Comment(0)

This isn't actually true, but it is useful to think of concurrency as coming in 2 forms:

Lock free concurrency
Lock based concurrency

It's not true because software lock based concurrency ends up being implemented using lock free atomic instructions somewhere on the stack (often in the Kernel). Lock free atomic instructions, however, all ultimately end up acquiring a hardware lock on the memory bus. So, in reality, lock free concurrency and lock based concurrency are the same.

But conceptually, at the level of a user application, they are 2 distinct ways of doing things.

Lock based concurrency is based on the idea of "locking" access to a critical section of code. When one thread has "locked" a critical section, no other thread may have code running inside that same critical section. This is usually done by the use of "mutexes", which interface with the os scheduler and cause threads to become un-runnable while waiting to enter a locked critical section. The other approach is to use "spin locks" which cause a thread to spin in a loop, doing nothing useful, until the critical section becomes available.

Lock free concurrency is based on the idea of using atomic instructions (specially supported by the CPU), that are guaranteed by hardware to run atomically. Interlocked.Increment is a good example of lock free concurrency. It just calls special CPU instructions that do an atomic increment.

Lock free concurrency is hard. It gets particularly hard as the length and complexity of critical sections increase. Any step in a critical section can be simultaneously executed by any number of threads at once, and they can move at wildly different speeds. You have to make sure that despite that, the results of a system as a whole remain correct. For something like an increment, it can be simple (the cs is just one instruction). For more complex critical sections, things can get very complex very quickly.

Lock based concurrency is also hard, but not quite as hard as lock free concurrency. It allows you to create arbitrarily complex regions of code and know that only 1 thread is executing it at any time.

Lock free concurrency has one big advantage, however: speed. When used correctly it can be orders of magnitude faster than lock based concurrency. Spin loops are bad for long running critical sections because they waste CPU resources doing nothing. Mutexes can be bad for small critical sections because they introduce a lot of overhead. They involve a mode switch at a minimum, and multiple context switches in the worst case.

Consider implementing the Managed heap. Calling into the OS everytime "new" is called would be horrible. It would destroy the performance of your app. However, using lock free concurrency it's possible to implement gen 0 memory allocations using an interlocked increment (I don't know for sure if that's what the CLR does, but I'd be surprised if it wasn't. That can be a HUGE savings.

There are other uses, such as in lock free data structures, like persistent stacks and avl trees. They usually use "cas" (compare and swap).

The reason, however, that locked based concurrency and lock free concurrency are really equivalent is because of the implementation details of each.

Spin locks usually use atomic instructions (typically cas) in their loop condition. Mutexes need to use either spin locks or atomic updates of internal kernel structures in their implementation.

The atomic instructions are in turn implemented using hardware locks.

In any case, they both have their sets of trade offs, usually centering on perf vs complexity. Mutexes can be both faster and slower than lock free code. Lock free code can be both more and less complex than a mutex. The appropriate mechanism to use depends on specific circumstances.

Now, to answer your question:

A method that did an interlocked compare exchange if less than would imply to callers that it is not using locks. You can't implement it with a single instruction the same way increment or compare exchange can be done. You could simulate it doing a subtraction (to compute less than), with an interlocked compare exchange in a loop. You can also do it with a mutex (but that would imply a lock and so using "interlocked" in the name would be misleading). Is it appropriate to build the "simulated interlocked via cas" version? That depends. If the code is called very frequently, and has very little thread contention then the answer is yes. If not, you can turn a O(1) operation with moderately high constant factors into an infinite (or very long) loop, in which case it would be better to use a mutex.

Most of the time it's not worth it.

Illstarred answered 24/10, 2012 at 3:28 Comment(2)

FYI, on many platforms, it's possible for an application to request a certain amount of thread-local data which can be accessed very quickly; .NET does not expose this ability to user code, but from what I understand some versions of its memory allocator subdivide the gen0 heap into chunks which are then given to different cores; each core gets its own next-allocation pointer, allowing it to allocate memory from its chunk without having to synchronize with anything else. – Idiophone 23/1, 2014 at 21:8

Also, I see nothing wrong with interlocked methods that include a read-modify-CompareExchange loop. That's what some implementations of Interlocked.Increment(ref long) do. Provided that the computation of a new value from an old one is fast, there's not really any danger of code spending any significant time in the loop; unless other threads are being deliberately malicious, the worst-case number of unsuccessful attempts would equal the number of cores. – Idiophone 23/1, 2014 at 21:13

What do you think about this implementation:

// this is a Interlocked.ExchangeIfGreaterThan implementation
private static void ExchangeIfGreaterThan(ref long location, long value)
{
    // read
    long current = Interlocked.Read(ref location);
    // compare
    while (current < value)
    {
        // set
        var previous = Interlocked.CompareExchange(ref location, value, current);
        // if another thread has set a greater value, we can break
        // or if previous value is current value, then no other thread has it changed in between
        if (previous == current || previous >= value) // note: most commmon case first
            break;
        // for all other cases, we need another run (read value, compare, set)
        current = Interlocked.Read(ref location);
    }
}

Purify answered 10/11, 2012 at 15:5 Comment(0)

What we get with this class is an interlocked value which can be read very fast, but write takes a little bit more. Read is about 2-4 times faster in our application.

Here the code as view:

See here: http://files.thekieners.com/blogcontent/2012/ExchangeIfGreaterThan2.png

Here as code to copy&paste:

public sealed class InterlockedValue
{
    private long _myValue;
    private readonly object _syncObj = new object();

    public long ReadValue()
    {
        // reading of value (99.9% case in app) will not use lock-object, 
        // since this is too much overhead in our highly multithreaded app.
        return Interlocked.Read(ref _myValue);
    }

    public bool SetValueIfGreaterThan(long value)
    {
        // sync Exchange access to _myValue, since a secure greater-than comparisons is needed
        lock (_syncObj)
        {
            // greather than condition
            if (value > Interlocked.Read(ref  _myValue))
            {
                // now we can set value savely to _myValue.
                Interlocked.Exchange(ref _myValue, value);
                return true;
            }
            return false;
        }
    }
}

Purify answered 15/11, 2012 at 6:46 Comment(4)

I don't think you really need to use Interlocked inside the lock block, as you've already locked the entire class instance by locking around _syncObj – Giliane 23/1, 2014 at 16:27

On a 32-bit environment a long read is not guaranteed to be atomic. So says this blog – Pennyweight 11/4, 2014 at 0:50

All interlocked operations have direct support in the hardware.

Interlocked operations and atomic data type are different things.. Atomic type is a library level feature. On some platforms and for some data types atomics are implemented using interlocked instructions. In this case they are very effective.

In other cases when platform does not have interlocked operations at all or they are not available for some particular data type, library implements these operations using appropriate synchronization (crit_sect, mutex, etc).

I am not sure if Interlocked.GreaterThan is really needed. Otherwise it might be already implemented. If your know good example where it can be useful, I am sure that everybody here will be happy to hear this.

Solana answered 24/10, 2012 at 2:17 Comment(3)

I am implementing a lightweight (Complex Data Processing/CEP)-like analysis engine. I read data with date time stamps that are out of order. Depending on the comparison result of the date then perhaps I can use CompareExchangeGreaterThan(ref int, int, int) to make things quicker and faster without any Spinlocks. – Melodiemelodion 24/10, 2012 at 2:40

PS - Yes, I know DateTime is a Double, but I'm really only doing "seconds from 1970" so I can fit my data into a smaller memory allocation like int. – Melodiemelodion 24/10, 2012 at 2:42

I see. This might be useful. On the other hand, since this is in the hardware, you know, there is little chance that Intel will ever add it to its platform. Intel needs noticeable performance gain to justify investment in this feature. Changes in the hardware are not cheap. – Solana 24/10, 2012 at 2:43

Greater/Less than and equal to are already atomic operations. That doesn't address the safe concurrent behavior of your application tho.

There is no point in making them part of the Interlocked family, so the question is: what are you actually trying to achieve?

Snapp answered 24/10, 2012 at 2:17 Comment(1)

I want to store the larger value after an atomic comparison. – Melodiemelodion 24/10, 2012 at 2:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags