C#/CLR: MemoryBarrier and torn reads
Asked Answered
S

2

5

Just playing around with concurrency in my spare time, and wanted to try preventing torn reads without using locks on the reader side so concurrent readers don't interfere with each other.

The idea is to serialize writes via a lock, but use only a memory barrier on the read side. Here's a reusable abstraction that encapsulate the approach I came up with:

public struct Sync<T>
    where T : struct
{
    object write;
    T value;
    int version; // incremented with each write

    public static Sync<T> Create()
    {
        return new Sync<T> { write = new object() };
    }

    public T Read()
    {
        // if version after read == version before read, no concurrent write
        T x;
        int old;
        do
        {
            // loop until version number is even = no write in progress
            do
            {
                old = version;
                if (0 == (old & 0x01)) break;
                Thread.MemoryBarrier();
            } while (true);
            x = value;
            // barrier ensures read of 'version' avoids cached value
            Thread.MemoryBarrier();
        } while (version != old);
        return x;
    }

    public void Write(T value)
    {
        // locks are full barriers
        lock (write)
        {
            ++version;             // ++version odd: write in progress
            this.value = value;
            // ensure writes complete before last increment
            Thread.MemoryBarrier();
            ++version;             // ++version even: write complete
        }
    }
}

Don't worry about overflow on the version variable, I avoid that another way. So is my understanding and application of Thread.MemoryBarrier correct in the above? Are any of the barriers unnecessary?

Strikebound answered 21/9, 2013 at 18:14 Comment(1)
Your comments and code (info/version) are out of sync.Abdicate
F
3

I took a long hard look at your code and it does appear correct to me. One thing that immediately jumped out at me was that you used an established pattern for performing the low-lock operation. I can see that you are using version as a kind of virtual lock. Even numbers are released and odd numbers are acquired. And since you are using a monotonically increasing value for the virtual lock you are also avoiding the ABA problem. The most important thing, however, is that you continue to loop while attempting to read until the virtual lock value is observed to be the same before the read starts as compared to after it completes. Otherwise, you consider this a failed read and try it all over again. So yeah, job well done on the core logic.

So what about the placement of the memory barrier generators? Well, this all looks pretty good as well. All of the Thread.MemoryBarrier calls are required. If I had to nit-pick I would say you need one additional one in the Write method so that it looks like this.

public void Write(T value)
{
    // locks are full barriers
    lock (write)
    {
        ++version;             // ++version odd: write in progress
        Thread.MemoryBarrier();
        this.value = value;
        Thread.MemoryBarrier();
        ++version;             // ++version even: write complete
    }
}

The added call here ensures that ++version and this.value = value do not get swapped. Now, the ECMA specification technically allows that kind of instruction reordering. However, Microsoft's implementation of the CLI and the x86 hardware both already have volatile semantics on writes so it would not really be needed in most cases. But, who knows, maybe it would be necessary on the Mono runtime targeting the ARM cpu.

On the Read side of things I can find no faults. In fact, the placement of the calls you have is exactly where I would have put them. Some people may wonder why you do not need one before the initial read of version. The reason is because the outer loop will catch the case when the first read was cached because of the Thread.MemoryBarrier further down.

So this brings me to a discussion about performance. Is this really faster than taking a hard lock in the Read method? Well, I did some pretty extensive testing of your code to help answer that. The answer is a definitive yes! This is quite a bit faster than taking a hard lock. I tested using a Guid as the value type because it is 128 bits and so that is bigger than the native word size of my machine (64 bits). I also used several different variations on the number of writers and readers. Your low lock technique consistently and significantly outperformed the hard lock technique. I even tried a few variations using Interlocked.CompareExchange to do the guarded read and they were all slower as well. In fact, in some situations it was actually slower than taking the hard lock. I have to be honest. I was not at all surprised by this.

I also did some pretty significant validity testing. I created tests that would run for quite some time and not once did I see a torn read. And then as a control test I would tweak the Read method in such a manner that I knew it would be incorrect and I ran the test again. This time, as expected, torn reads started to appear randomly. I switched the code back to what you have and the torn reads disappeared; again, as expected. This seemed to confirm what I already expected. That is, your code looks correct. I do not have a wide variety of runtime and hardware environments to test with (nor do I have the time) so I am not willing to give it a 100% seal of approval, but I do think I can give your implementation two thumbs up for now.

Finally, with all that said, I would still avoid putting this in production. Yeah, it may be correct, but the next guy who has to maintain the code is probably not going to understand it. Someone may change the code and break it because they do not understand the consequences of their changes. You have to admit that this code is pretty brittle. Even the slightest change could break it.

Forby answered 30/10, 2013 at 2:24 Comment(3)
Thanks for the detailed review. I actually posted about this on my blog: higherlogics.blogspot.ca/2013/09/…, and these functions and some others built on them are in my open source Sasa library: sourceforge.net/p/sasa/code/ci/default/tree/Sasa/…. The latest versions were switched to use VolatileRead/VolatileWrite for clarity. You can actually use these to implement load-linked/store-conditional primitives as well.Strikebound
@naasking: See if you can create a LL/SC operation and post it to your blog. I'd like to see what you come up with. If I get time I may try it myself.Forby
LL/SC is already in the code I linked to above (bottom of the page). I also factored some of the code into a reusable struct to simplify the API: sourceforge.net/p/sasa/code/ci/default/tree/Sasa.Concurrency/…Strikebound
P
-1

It seems like you're interesting in lock-free /wait-free implementation. Let start from this discussion, for example: Lock-free multi-threading is for real threading experts

Phosphorescence answered 21/9, 2013 at 18:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.