I've spent today looking into lockless queues. I have a multiple producer, multiple consumer situation. I implemented, for testing, a system using the Interlocked SList thing under Win32 and it doubled the performance of my heavily threaded task based code. Unfortunately, though, I wish to support multiple platforms. Interlocking on multiple platforms itself is not a problem and I can safely assume I can interlock without problems. However the actual implementation loses me.
The big problem appears to be that you need to guarantee a list push/pop will use only one Interlocked call. Otherwise you are leaving space for another thread to nip in and screw things up. I'm unsure how the microsoft implementation works under-the-hood and would love to know more.
Can anyone point me towards useful information (Platform and language are pretty irrelevant)?
Added to that I'd love to know if its possible to implement a lockless vector. That would have huge amounts of use for me :) Cheers!
Edit: Having read herb's DDJ article I can see a reduced lock queue that is pretty similar to the one I already had. However I notice that there are papers at the end that can do the true lockless queue'ing using a double compare-and-swap (DCAS) operation. Has anyone implemented a queue using cmpxchg8b (or cmpxchg16b for that matter)?
I'm just musing at this point (having not read the papers) but you could use this system to update the head and tail pointer simultaneously and thus avoid any issues with another thread jumping inbetween the 2 atomic operations. However you still need to get the next head pointer to test that against the tail pointer to see if yo have just modified the tail. How do you avoid another thread changing this information while the other thread is preparing to do so itself? How exactly is this implemented in a lockless way? Or am i better off reading the undecipherability that is a research paper? ;)