Implementing concurrent_vector according to intel blog

template<typename T> class concurrent_vector { private: int size = 0; int lastSegmentIndex = 0; union { T* segmentsSmall[3]; T** segmentsLarge; }; void switch_to_large() { //Bunch of allocations, creates a T* segmentsLarge[32] basically and reassigns all old entries into it } public: concurrent_vector() { //The initial array is contiguous just for the sake of cache optimization T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3 segmentsSmall[0] = initialContiguousBlock; segmentsSmall[1] = initialContiguousBlock + 2; segmentsSmall[2] = initialContiguousBlock + 2 + 4; } void push_back(T& item) { if(size > 2 + 4 + 8) { switch_to_large(); //This is the problem part, there is no possible way to make this thread-safe without a mutex lock. I don't understand how Intel does it. It includes a bunch of allocations and memory copies. } InterlockedIncrement(&size); //Ok, so size is atomically increased //afterwards adds the item to the appropriate slot in the appropriate segment } };

I would not try to make the segmentsLarge and segmentsSmall a union. Yes this wastes one more pointer. Then the pointer, lets call it just segments can initially point to segmentsSmall.

On the other hand the other methods can always use the same pointer which makes them simpler.

And switching from small to large can be accomplished by one compare exchange of a pointer.

I am not sure how this could be accomplished safely with a union.

The idea would look something like this (note that I used C++11, which the Intel library predates, so they likely did it with their atomic intrinsics). This probably misses quite a few details which I am sure the Intel people have thought more about, so you will likely have to check this against the implementations of all other methods.

#include <atomic>
#include <array>
#include <cstddef>
#include <climits>

template<typename T>
class concurrent_vector
{
private:
  std::atomic<size_t> size;
  std::atomic<T**> segments;
  std::array<T*, 3> segmentsSmall;
  unsigned lastSegmentIndex = 0;

  void switch_to_large()
  {
    T** segmentsOld = segments;
    if( segmentsOld == segmentsSmall.data()) {
      // not yet switched
      T** segmentsLarge = new T*[sizeof(size_t) * CHAR_BIT];
      // note that we leave the original segment allocations alone and just copy the pointers
      std::copy(segmentsSmall.begin(), segmentsSmall.end(), segmentsLarge);
      for(unsigned i = segmentsSmall.size(); i < numSegments; ++i) {
        segmentsLarge[i] = nullptr;
      }
      // now both the old and the new segments array are valid
      if( segments.compare_exchange_strong(segmentsOld, segmentsLarge)) {
        // success!
        return;
      }  else {
        // already switched, just clean up
        delete[] segmentsLarge;
      }
    }
  }

public:
  concurrent_vector()  : size(0), segments(segmentsSmall.data())
  {
    //The initial array is contiguous just for the sake of cache optimization
    T* initialContiguousBlock = new T[2 + 4 + 8]; //2^1 + 2^2 + 2^3
    segmentsSmall[0] = initialContiguousBlock;
    segmentsSmall[1] = initialContiguousBlock + 2;
    segmentsSmall[2] = initialContiguousBlock + 2 + 4;
  }

  void push_back(T& item)
  {
    if(size > 2 + 4 + 8) {
      switch_to_large();
    }
    // here we may have to allocate more segments atomically
    ++size;

    //afterwards adds the item to the appropriate slot in the appropriate segment
  }
};

Recommended topics

Hot tags