I'm currently looking at a copy-on-write set implementation and want to confirm it's thread safe. I'm fairly sure the only way it might not be is if the compiler is allowed to reorder statements within certain methods. For example, the Remove
method looks like:
public bool Remove(T item)
{
var newHashSet = new HashSet<T>(hashSet);
var removed = newHashSet.Remove(item);
hashSet = newHashSet;
return removed;
}
Where hashSet is defined as
private volatile HashSet<T> hashSet;
So my question is, given that hashSet is volatile
does it mean that the Remove
on the new set happens before the write to the member variable? If not, then other threads may see the set before the remove has occurred.
I haven't actually seen any issues with this in production, but I just want to confirm it is guaranteed to be safe.
UPDATE
To be more specific, there's another method to get an IEnumerator
:
public IEnumerator<T> GetEnumerator()
{
return hashSet.GetEnumerator();
}
So the more specific question is: is there a guarantee that the returned IEnumerator
will never throw a ConcurrentModificationException
from a remove?
UPDATE 2
Sorry, the answers are all addressing the thread safety from multiple writers. Good points are raised, but that's not what I'm trying to find out here. I'd like to know if the compiler is allowed to re-order the operations in Remove
to something like this:
var newHashSet = new HashSet<T>(hashSet);
hashSet = newHashSet; // swapped
var removed = newHashSet.Remove(item); // swapped
return removed;
If this was possible, it would mean that a thread could call GetEnumerator
after hashSet
had been assigned, but before item
was removed, which could lead to the collection being modified during enumeration.
Joe Duffy has a blog article that states:
Volatile on loads means ACQUIRE, no more, no less. (There are additional compiler optimization restrictions, of course, like not allowing hoisting outside of loops, but let’s focus on the MM aspects for now.) The standard definition of ACQUIRE is that subsequent memory operations may not move before the ACQUIRE instruction; e.g. given { ld.acq X, ld Y }, the ld Y cannot occur before ld.acq X. However, previous memory operations can certainly move after it; e.g. given { ld X, ld.acq Y }, the ld.acq Y can indeed occur before the ld X. The only processor Microsoft .NET code currently runs on for which this actually occurs is IA64, but this is a notable area where CLR’s MM is weaker than most machines. Next, all stores on .NET are RELEASE (regardless of volatile, i.e. volatile is a no-op in terms of jitted code). The standard definition of RELEASE is that previous memory operations may not move after a RELEASE operation; e.g. given { st X, st.rel Y }, the st.rel Y cannot occur before st X. However, subsequent memory operations can indeed move before it; e.g. given { st.rel X, ld Y }, the ld Y can move before st.rel X.
The way I read this is that the call to newHashSet.Remove
requires a ld newHashSet
and the write to hashSet
requires a st.rel newHashSet
. From the above definition of RELEASE no loads can move after the store RELEASE, so the statements cannot be reordered! Could someone confirm please confirm my interpretation is correct?
volatile
will allow less reordering. On that point your code is safe. But this solution can never be considered thread-safe, and it doesn't look like copy-on-write. – CircumcisionCopyOnWriteArrayList
andCopyOnWriteArraySet
classes which provide similar functionality, so blame Sun if you disagree with the use of the term :) – WoodsumRemove
will never be called simultaneously. – Woodsum