I am trying to implement a thread-safe lockless container, analogous to std::vector, according to this https://software.intel.com/en-us/blogs/2008/07/24/tbbconcurrent_vector-secrets-of-memory-organization
From what I understood, to prevent re-allocations and invalidating all iterators on all threads, instead of a single contiguous array, they add new contiguous blocks.
Each block they add is with a size of increasing powers of 2, so they can use log(index) to find the proper segment where an item at [index] is supposed to be.
From what I gather, they have a static array of pointers to segments, so they can quickly access them, however they don't know how many segments the user wants, so they made a small initial one and if the amount of segments exceeds the current count, they allocate a huge one and switch to using that one.
The problem is, adding a new segment can't be done in a lockless thread safe manner or at least I haven't figured out how. I can atomically increment the current size, but only that.
And also switching from the small to the large array of segment pointers involves a big allocation and memory copies, so I can't understand how they are doing it.
They have some code posted online, but all the important functions are without available source code, they are in their Thread Building Blocks DLL. Here is some code that demonstrates the issue:
template<typename T>
class concurrent_vector
{
private:
int size = 0;
int lastSegmentIndex = 0;
union
{
T* segmentsSmall[3];
T** segmentsLarge;
};
void switch_to_large()
{
//Bunch of allocations, creates a T* segmentsLarge[32] basically and reassigns all old entries into it
}
public:
concurrent_vector()
{
//The initial array is contiguous just for the sake of cache optimization
T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3
segmentsSmall[0] = initialContiguousBlock;
segmentsSmall[1] = initialContiguousBlock + 2;
segmentsSmall[2] = initialContiguousBlock + 2 + 4;
}
void push_back(T& item)
{
if(size > 2 + 4 + 8)
{
switch_to_large(); //This is the problem part, there is no possible way to make this thread-safe without a mutex lock. I don't understand how Intel does it. It includes a bunch of allocations and memory copies.
}
InterlockedIncrement(&size); //Ok, so size is atomically increased
//afterwards adds the item to the appropriate slot in the appropriate segment
}
};