Reordering of operations around volatile
Asked Answered
W

3

8

I'm currently looking at a copy-on-write set implementation and want to confirm it's thread safe. I'm fairly sure the only way it might not be is if the compiler is allowed to reorder statements within certain methods. For example, the Remove method looks like:

public bool Remove(T item)
{
    var newHashSet = new HashSet<T>(hashSet);
    var removed = newHashSet.Remove(item);
    hashSet = newHashSet;
    return removed;
}

Where hashSet is defined as

private volatile HashSet<T> hashSet;

So my question is, given that hashSet is volatile does it mean that the Remove on the new set happens before the write to the member variable? If not, then other threads may see the set before the remove has occurred.

I haven't actually seen any issues with this in production, but I just want to confirm it is guaranteed to be safe.

UPDATE

To be more specific, there's another method to get an IEnumerator:

public IEnumerator<T> GetEnumerator()
{
    return hashSet.GetEnumerator();
}

So the more specific question is: is there a guarantee that the returned IEnumerator will never throw a ConcurrentModificationException from a remove?

UPDATE 2

Sorry, the answers are all addressing the thread safety from multiple writers. Good points are raised, but that's not what I'm trying to find out here. I'd like to know if the compiler is allowed to re-order the operations in Remove to something like this:

    var newHashSet = new HashSet<T>(hashSet);
    hashSet = newHashSet;                  // swapped
    var removed = newHashSet.Remove(item); // swapped
    return removed;

If this was possible, it would mean that a thread could call GetEnumerator after hashSet had been assigned, but before item was removed, which could lead to the collection being modified during enumeration.

Joe Duffy has a blog article that states:

Volatile on loads means ACQUIRE, no more, no less. (There are additional compiler optimization restrictions, of course, like not allowing hoisting outside of loops, but let’s focus on the MM aspects for now.) The standard definition of ACQUIRE is that subsequent memory operations may not move before the ACQUIRE instruction; e.g. given { ld.acq X, ld Y }, the ld Y cannot occur before ld.acq X. However, previous memory operations can certainly move after it; e.g. given { ld X, ld.acq Y }, the ld.acq Y can indeed occur before the ld X. The only processor Microsoft .NET code currently runs on for which this actually occurs is IA64, but this is a notable area where CLR’s MM is weaker than most machines. Next, all stores on .NET are RELEASE (regardless of volatile, i.e. volatile is a no-op in terms of jitted code). The standard definition of RELEASE is that previous memory operations may not move after a RELEASE operation; e.g. given { st X, st.rel Y }, the st.rel Y cannot occur before st X. However, subsequent memory operations can indeed move before it; e.g. given { st.rel X, ld Y }, the ld Y can move before st.rel X.

The way I read this is that the call to newHashSet.Remove requires a ld newHashSet and the write to hashSet requires a st.rel newHashSet. From the above definition of RELEASE no loads can move after the store RELEASE, so the statements cannot be reordered! Could someone confirm please confirm my interpretation is correct?

Woodsum answered 6/7, 2012 at 7:7 Comment(11)
volatile will allow less reordering. On that point your code is safe. But this solution can never be considered thread-safe, and it doesn't look like copy-on-write.Circumcision
Could you elaborate on why you think it's not copy-on-write?Woodsum
It's incomplete but since hashSet is private it looks like replace-on-write.Circumcision
Sorry, when I say thread safe, I should have said it should be safe to read from the collection while it's being written to. Writes to the collection still need to be synchronised. It's named after the Java CopyOnWriteArrayList and CopyOnWriteArraySet classes which provide similar functionality, so blame Sun if you disagree with the use of the term :)Woodsum
I guess I can kind of see how it's intended. But you only need to call this Remove() from 2 threads to have a race-condition.Circumcision
Yes, I agree. I did mention in the comment that writes to the collection need to be serialised.Woodsum
Re the update: a reference assignment is atomic anyway so except for caching issues (memory model) you don't seem to need anything. No volatile, no interlocked.Circumcision
@HenkHolterman If two invocations of Remove are simultaneously called with different items, what will the result be? The operation needs to be performed, and then checked to see if any other thread might have changed the source of the operation before it completed, in which case the operation needs to be redone with the new source. This is best done using Interlocked.CompareExchange.Antrim
To clarify, I'm not concerned about multiple writes to the collection, these are being serialised. I am concerned about non-serialised reads at the same time as a write.Woodsum
@Woodsum I believe you are correct in your understanding of RELEASE. But I don't understand how this implementation of Remove can be considered thread safe when two simultaneous calls to Remove (with different items) can result (after both calls are completed) in a collection with only one of the items removed. Even if the order of statements is never changed.Antrim
@Monroe, I've mentioned this a few times, but I'll repeat all calls to methods that mutate the collection are serialised. To be more explicit, I am ensuring (externally to the collection), that Remove will never be called simultaneously.Woodsum
A
1

EDITED:

Thanks for clarifying the presence of an external lock for calls to Remove (and other collection mutations).

Because of RELEASE semantics you will not end up storing a new value to hashSet until after the value to the variable removed has been assigned (because st removed can't be moved after st.rel hashSet).

Therefore, the 'snapshot' behaviour of GetEnumerator will work as intended, at least with respect to Remove and other mutators implemented in a similar fashion.

Antrim answered 10/7, 2012 at 19:39 Comment(2)
The 'snapshot' behaviour is by design. The aim of the collection is for it to be safe to read from without serialising access. Writes do need to be serialised however.Woodsum
@Woodsum Thanks for the clarication.Antrim
C
3

Consider using Interlocked.Exchange - it will guarantee ordering, or Interlocked.CompareExchange which as side benifit will let you detect (and potentially recover from) simultaneous writes to collection. Clearly it adds some additional level of synchronization so it is different from your current code, but easier to reason about.

public bool Remove(T item) 
{ 
    var old = hashSet;
    var newHashSet = new HashSet<T>(old); 
    var removed = newHashSet.Remove(item); 
    var current = Interlocked.CompareExchange(ref hashSet, newHashSet, old);
    if (current != old)
    {
       // failed... throw or retry...
    }
    return removed; 
} 

And I think you'll still need volatile hashSet in this case.

Colvin answered 6/7, 2012 at 7:53 Comment(3)
Just to clarify, are you saying just flagging hashSet as volatile won't guarantee the ordering I need, or that it's just cleaner using Interlocked.CompareExchange? I'm just trying to understand the CLR memory model better.Woodsum
@Woodsum for me "just cleaner using Interlocked*": I'm nowhere an expert with lock-free code. So I would not try to rely on volatile for safety in code I write - I can reason about Interlocking operations being safe, but you need someone who understand CLR/C# memory model well to answer if volatile is enough or not.Colvin
@SimonC, Here is and article linked from one of related question that you may find useful - drdobbs.com/go-parallel/parallel/volatile-vs-volatile/… .Colvin
A
1

EDITED:

Thanks for clarifying the presence of an external lock for calls to Remove (and other collection mutations).

Because of RELEASE semantics you will not end up storing a new value to hashSet until after the value to the variable removed has been assigned (because st removed can't be moved after st.rel hashSet).

Therefore, the 'snapshot' behaviour of GetEnumerator will work as intended, at least with respect to Remove and other mutators implemented in a similar fashion.

Antrim answered 10/7, 2012 at 19:39 Comment(2)
The 'snapshot' behaviour is by design. The aim of the collection is for it to be safe to read from without serialising access. Writes do need to be serialised however.Woodsum
@Woodsum Thanks for the clarication.Antrim
T
0

I can't speak for C#, but in C volatile in principle indicates and only indicates that the content of the variable could change at any time. It offers no constraints regarding compiler or CPU re-ordering. All you get is that the compiler/CPU will always read the value from memory, rather than trusting a cached version.

I believe however in recent MSVC (and so quite possibly C#), reading a volatile acts as a memoy barrier for loads and writing acts as a memory barrier for stores, e.g. the CPU will stall till all loads have completed and no loads may escape this by being re-ordered below the volatile read (although later independent loads may still move up above the memory barrier); and the CPU will stall till are stores have completed and no stores may escape this by being re-ordered below the volatile write (although later independent writes may still move up above the memory barrier).

When only a single thread is writing a given variable (but many are reading), only memory barriers are required for correct operation. When multiple threads may write to a given variable, atomic operations have to be used as CPU design is such that there is basically a race condition on write, such that a write can be lost.

Tenrec answered 6/7, 2012 at 9:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.