Lock free & Thread-Safe IList<T> for .NET
Asked Answered
C

3

14

Is there a lock-free & thread-safe data structure that implements IList?

Naturally by lock-free I mean an implementation that makes no use of locking primitives in .NET but rather uses interlocked operations / atomic operations to achieve thread safety... There isn't one, apparently under the concurrent data structures...

Has anyone seen one floating around?

I've seen a java one implemented in amino-cbbs, called LockFreeVector but nothing for .NET so far. Any ideas?

Coquito answered 21/2, 2011 at 19:46 Comment(7)
I presume you mean "lock-free" and thread-safe, since List<T> is quite lock-free. You should also clarify what you mean by "lock-free".Lettielettish
@Coquito Another note: They (amino) are implementing a LINKED list, not a list. A LinkedList in C# doesn't implement IList/IList<T>. They have a LockFreeVector... But I don't think it's "fully" lock free.Orientate
How "full" an implementation are you looking for? I highly doubt you'll be able to find, e.g., a type that supports a random Insert without locking (unless you allow spinning, I guess, since that isn't really the same as "locking"). But then, what do I know?Asperges
@xantos: I was referring to the LockFreeVector<T> of amino: amino-cbbs.sourceforge.net/java_apidocs/org/amino/ds/lockfree/…Coquito
@damageboy: Note that the lock-free structure described in the paper on which LockFreeVector<E> is based does not have a random Insert operation (which is part of the IList<T> interface).Asperges
@Dan Tao: You are absolutely correct, if you read my comment below where I replied to Valentin Kuzub you'll see that it perfectly fits into the only use-case where a LockFreeVector<T> still makes sense at all...Coquito
"Lock free" means at least one thread makes progress at one time, (for example, the other threads would have to loop and retry compare/swap operation). "Wait free" means all threads always make progress. A lock free version will get you most of the speed increase you're going to get. It is exceedingly difficult to come up with wait free algorithms that don't have severe restrictions in one sense or another.Quemoy
A
7

Well, I couldn't find such a class anywhere; so I gave it a shot.

The source code for my ConcurrentList<T> class is available on GitHub.

It is lock-free, thread-safe (I think, based on my unit tests), and implements IList<T>.

It does not support Insert, RemoveAt/Remove, or Clear.

I was pleased to discover that my implementation (which I came up with independently) is very similar to that of a data structure published by some well-respected minds within the world of software.

For a fairly brief discussion of the implementation itself, see my recent blog post about it.

At the moment, it is not documented at all, which is kind of bad considering how "tricky" some of the code is :(

And by all means, rip me a new one if you take a look and find bugs or other issues.

Anyway, it might be worth your time to check it out. If you do, let me know what you think.

Asperges answered 25/2, 2011 at 21:46 Comment(20)
+1 for going ahead and implementing it on your own. I'm also in the process of porting the java version into c#... I read your blog post, and I'm still going to run your tests on my machine, but the main benefit is for multiple reader threads with a single writer/appender thread. That is where you should expect to see a performance boost as the plain dumb "use lock()" solution would still lock for every read...Coquito
BTW: The amino-cbbs version of LockFreeVector<T> is also based on the same paper you referenced: amino-cbbs.svn.sourceforge.net/viewvc/amino-cbbs/trunk/amino/…Coquito
I like you implementation a lot. I've made some improvements to it, which I will upload to github so you could also take a look at them as well. most of the changes I've made are around having more logical initial array sizes, e.g. 8,16,32... instead of 1,2,4 (a bit wasteful IMO), and around improving your LOG2 function into something completely horrific that should run in less cycles...Coquito
Is there a volatile keyword in C#? the target of a CAS needs to be volatile in C, because it's a run-time decision whether or not the CAS succeeds.Vaasta
Similarly, how do you ensure alignment? CAS on x86/x64 requires word aligned data; non-word aligned causes an exception.Vaasta
@Blank: In .NET, CAS is achieved using the Interlocked.CompareExchange method, which only provides overloads for types on which it is legal: 32- and 64-bit primitives (int, long, double, etc.) and reference types (which are either 32- or 64-bit references).Asperges
@Blank: There actually is a volatile keyword in C#, but it doesn't mean the same thing as in C.Asperges
@Blank: regarding volatile/compare-exchange: If you take a look @: github.com/damageboy/ConcurrentList/blob/master/ConcurrentList/… (which is my version of @Dan Tao's code, you will see that I've added comments with the actual x64 assembly created for the use of Interlocked.Increment / Interlocked.CompareExchange, it is clear that MS directly generates cmpxchg and lock xadd instructions that take a memory address. If you read the Intel literature you will see that intel promises these instructions to my synchronous across the memory bus and other CPU caches.Coquito
@Blank: regarding alginment, this is actually a very good point. On modern processors you don't have too many alignment requirements. (I answered this elsewhere: #882320). The .NET framework seems to ensure (still looking for official info on this) pointer alignment. This means that on x86 you'll get 4 byte alignment and on x64 8 byte alignment, The code in question (the original one by Dan Tao, and my updated version) both require up to pointer size alignment.Coquito
@damageboy: Have you read my more recent blog post on this subject, or the SynchronizedList<T> class I added to GitHub? We should talk on Skype some time!Asperges
@damageboy: re CAS - I'm not sure we understand each other. My point is this; the compiler, when faced with CAS, cannot know the result of the exchange, because it is information only available at run-time. However, the compiler does not know this unless you mark the destination operand with volatile, giving the potential for optimizer induced bugs.Vaasta
@Dan, @damageboy: Concurrent programming really isn't my forté so I might be wrong on this, but it looks to me like the indexer get/set and CopyTo methods in your implementations are susceptible to torn reads/writes if T is a large struct. They're reading/writing the array elements without any kind of synchronisation, which means no guarantee of atomicity if T is large.Subinfeudate
@LukeH: You're correct about atomicity not being guaranteed in for anything basically larger than a 128-bits in the Intel architecture, and even 64-bits in case of .NET, as the JIT does not generate cmpxchg16b instructions AFAIK. If someone were to store a struct in the ConcurrentList/SynchronizedList that is beyond 8 bytes, then he/she are basically b0rked. More over that: if, for some-reason, the memory is not aligned to 8 bytes, i.e. (&(_array[x]) & (8-1) != 0) || (&(_array[x][y]) & (8-1) != 0), then even those writes won't be always atomic. I'll update the code to test for this.Coquito
@LukeH: You're definitely right. I struggled with two different possible approaches to this: 1. Implement a factory method which only constructs instances of ConcurrentList<T> for T where there is a corresponding overload for Interlocked.Exchange (so basically, the primitive types plus all reference types); 2. Instead of a T[][] array internally, use a Slot<T>[][] where Slot<T> is a reference type and can therefore be atomically assigned using Interlocked.Exchange. In the end I basically went with the lazy solution and did neither ;)Asperges
@damageboy: if for single-word CAS memory is not aligned to 8 bytes, the CAS instruction will fault.Vaasta
@Blank: The Interlocked class in .NET provides a CAS-based CompareExchange method with overloads for float, double, int, long, IntPtr, and all reference types, all of which AFAIK will not be susceptible to alignment issues in the CLR. There is no overload that accepts user-defined value types.Asperges
@Blank: Can you please find a source for your alignment claims? The only thing I'm finding that is up to date in intel software architecture is #882320 (I've already answered this on some other thread)Coquito
Source is me. I've written a lock-free library in C - www.liblfds.org. If the CAS operands are not aligned on the system pointer length, the binary crashes. This is mainly a problem when the pointers in question are in structures (and so end up with unaligned addresses).Vaasta
If a method is lock-free, that means that if one thread starts invoking a method and gets waylaid during its execution that will not affect other threads. I haven't looked at your particular implementation, but many methods that use "eventual" consistency may never achieve a consistent state if the execution of an update method gets waylaid. An alternative approach, if there is some value which will never legitimately appear in the list (e.g. items are not allowed to be null), is to have Count hold the largest list slot which is known to have been assigned, and have Add...Chalone
...try to CompareExchange the new element into that slot. If the operation fails, try the next higher slot, etc. Then use a CompareExchange loop to update _Count, but give up if _Count gets to be higher than the value it's trying to store.Chalone
H
7

ConcurrentList implementing IList might be missing in Collections.Concurrent namespace because of whole Parallel.For and Parallel.ForEach class-methods. One can say that they can be used to handle any list as Concurrent, in order to quickly enumerate through the list and perform actions on its items.

Maybe by not providing ConcurrentList they meant or thought that if Parralel.For cannot help one would require to use not a IList but some other kind of collection like a stack or queue or even Bag or even Dictionary

I would agree with this design, because having to deal with indexable collection under multi thread conditions sounds like very error prone and bad design. Whats the purpose of knowing index of an item if collection can be modified at any time and index would be invalidated, in such circumstances when there are multiple readers - writers its pretty clear to me that Queue or Stack will be commonly best fitting collections, or Bag can be good too. Dictionary can be used also because its indexes are not invalidated by adding items to collection, and if you need parallel access to List you got your Parralel.For methods

What I find really weird - http://msdn.microsoft.com/en-us/library/dd381935.aspx here we can read about ConcurrentLinkedList class but I cannot find it in System.dll , only Bag and BlockingCollection are there.

I would also say that there is like 95% chance at least that either of two is true about your problem

  1. Parallel class methods are better than ConcurrentList
  2. Existing Concurrent collections are going to be better collections than ConcurrentList

I would also say that by not providing ConcurrentList they have saved developers who would mistakenly choose ConcurrentList to solve their problems from making many errors and saved them a lot of time forcing developers to use existing Concurrent collections.

Hiawatha answered 22/2, 2011 at 12:27 Comment(2)
the page you site says that the ConcurrentLinkedList is to be removed before the release of VS 2010 and not to use it. I agree with everything you wrote - nice job +1.Masaccio
While there are many points where I definitely do agree with you, I can still see a need for a thread-safe IList<T> much like the LockFreeVector<T> that is provided in amino-cbbs... The main use-case, which is coincidentally the one I'm after, is for when you have an append-only modification of a List<T> from one/more thread(s) while a few other threads may need to go over the list in read-only fashion up to the last-known .Count value when they started the enumeration / processing... IOW, I'm looking for read-only performance that is equivalent to an array, while still being able to appendCoquito
A
7

Well, I couldn't find such a class anywhere; so I gave it a shot.

The source code for my ConcurrentList<T> class is available on GitHub.

It is lock-free, thread-safe (I think, based on my unit tests), and implements IList<T>.

It does not support Insert, RemoveAt/Remove, or Clear.

I was pleased to discover that my implementation (which I came up with independently) is very similar to that of a data structure published by some well-respected minds within the world of software.

For a fairly brief discussion of the implementation itself, see my recent blog post about it.

At the moment, it is not documented at all, which is kind of bad considering how "tricky" some of the code is :(

And by all means, rip me a new one if you take a look and find bugs or other issues.

Anyway, it might be worth your time to check it out. If you do, let me know what you think.

Asperges answered 25/2, 2011 at 21:46 Comment(20)
+1 for going ahead and implementing it on your own. I'm also in the process of porting the java version into c#... I read your blog post, and I'm still going to run your tests on my machine, but the main benefit is for multiple reader threads with a single writer/appender thread. That is where you should expect to see a performance boost as the plain dumb "use lock()" solution would still lock for every read...Coquito
BTW: The amino-cbbs version of LockFreeVector<T> is also based on the same paper you referenced: amino-cbbs.svn.sourceforge.net/viewvc/amino-cbbs/trunk/amino/…Coquito
I like you implementation a lot. I've made some improvements to it, which I will upload to github so you could also take a look at them as well. most of the changes I've made are around having more logical initial array sizes, e.g. 8,16,32... instead of 1,2,4 (a bit wasteful IMO), and around improving your LOG2 function into something completely horrific that should run in less cycles...Coquito
Is there a volatile keyword in C#? the target of a CAS needs to be volatile in C, because it's a run-time decision whether or not the CAS succeeds.Vaasta
Similarly, how do you ensure alignment? CAS on x86/x64 requires word aligned data; non-word aligned causes an exception.Vaasta
@Blank: In .NET, CAS is achieved using the Interlocked.CompareExchange method, which only provides overloads for types on which it is legal: 32- and 64-bit primitives (int, long, double, etc.) and reference types (which are either 32- or 64-bit references).Asperges
@Blank: There actually is a volatile keyword in C#, but it doesn't mean the same thing as in C.Asperges
@Blank: regarding volatile/compare-exchange: If you take a look @: github.com/damageboy/ConcurrentList/blob/master/ConcurrentList/… (which is my version of @Dan Tao's code, you will see that I've added comments with the actual x64 assembly created for the use of Interlocked.Increment / Interlocked.CompareExchange, it is clear that MS directly generates cmpxchg and lock xadd instructions that take a memory address. If you read the Intel literature you will see that intel promises these instructions to my synchronous across the memory bus and other CPU caches.Coquito
@Blank: regarding alginment, this is actually a very good point. On modern processors you don't have too many alignment requirements. (I answered this elsewhere: #882320). The .NET framework seems to ensure (still looking for official info on this) pointer alignment. This means that on x86 you'll get 4 byte alignment and on x64 8 byte alignment, The code in question (the original one by Dan Tao, and my updated version) both require up to pointer size alignment.Coquito
@damageboy: Have you read my more recent blog post on this subject, or the SynchronizedList<T> class I added to GitHub? We should talk on Skype some time!Asperges
@damageboy: re CAS - I'm not sure we understand each other. My point is this; the compiler, when faced with CAS, cannot know the result of the exchange, because it is information only available at run-time. However, the compiler does not know this unless you mark the destination operand with volatile, giving the potential for optimizer induced bugs.Vaasta
@Dan, @damageboy: Concurrent programming really isn't my forté so I might be wrong on this, but it looks to me like the indexer get/set and CopyTo methods in your implementations are susceptible to torn reads/writes if T is a large struct. They're reading/writing the array elements without any kind of synchronisation, which means no guarantee of atomicity if T is large.Subinfeudate
@LukeH: You're correct about atomicity not being guaranteed in for anything basically larger than a 128-bits in the Intel architecture, and even 64-bits in case of .NET, as the JIT does not generate cmpxchg16b instructions AFAIK. If someone were to store a struct in the ConcurrentList/SynchronizedList that is beyond 8 bytes, then he/she are basically b0rked. More over that: if, for some-reason, the memory is not aligned to 8 bytes, i.e. (&(_array[x]) & (8-1) != 0) || (&(_array[x][y]) & (8-1) != 0), then even those writes won't be always atomic. I'll update the code to test for this.Coquito
@LukeH: You're definitely right. I struggled with two different possible approaches to this: 1. Implement a factory method which only constructs instances of ConcurrentList<T> for T where there is a corresponding overload for Interlocked.Exchange (so basically, the primitive types plus all reference types); 2. Instead of a T[][] array internally, use a Slot<T>[][] where Slot<T> is a reference type and can therefore be atomically assigned using Interlocked.Exchange. In the end I basically went with the lazy solution and did neither ;)Asperges
@damageboy: if for single-word CAS memory is not aligned to 8 bytes, the CAS instruction will fault.Vaasta
@Blank: The Interlocked class in .NET provides a CAS-based CompareExchange method with overloads for float, double, int, long, IntPtr, and all reference types, all of which AFAIK will not be susceptible to alignment issues in the CLR. There is no overload that accepts user-defined value types.Asperges
@Blank: Can you please find a source for your alignment claims? The only thing I'm finding that is up to date in intel software architecture is #882320 (I've already answered this on some other thread)Coquito
Source is me. I've written a lock-free library in C - www.liblfds.org. If the CAS operands are not aligned on the system pointer length, the binary crashes. This is mainly a problem when the pointers in question are in structures (and so end up with unaligned addresses).Vaasta
If a method is lock-free, that means that if one thread starts invoking a method and gets waylaid during its execution that will not affect other threads. I haven't looked at your particular implementation, but many methods that use "eventual" consistency may never achieve a consistent state if the execution of an update method gets waylaid. An alternative approach, if there is some value which will never legitimately appear in the list (e.g. items are not allowed to be null), is to have Count hold the largest list slot which is known to have been assigned, and have Add...Chalone
...try to CompareExchange the new element into that slot. If the operation fails, try the next higher slot, etc. Then use a CompareExchange loop to update _Count, but give up if _Count gets to be higher than the value it's trying to store.Chalone
W
0

Think about how random access (implied by IList<T>) could work in a multithreaded environment. You can't actually really do anything without it being immutable since adding and removing, even if they are atomic, don't prevent thread 1 from removing the item at the index that thread 2 was just about to retrieve. That's why the SyncRoot stuff is gone in .NET 2.0+

Wohlen answered 25/2, 2011 at 21:51 Comment(1)
I have though about and I have answered this point when Valentin Kuzub raised it in his comment. The only useful scenario is for a .Add() only IList<T>, where Remove/Insert are not applicable/allowed. This can and DOES in fact improve reader performance greatly while still allowing thread-safe append operations.Coquito

© 2022 - 2024 — McMap. All rights reserved.