Parallelizing a simple loop with c++17 algorithms

#include <algorithm> #include <vector> struct TKeyObjPtr; class TObj { public: virtual void Calculate(TKeyObjPtr const &) = 0; }; struct TKeyObjPtr { int Key; TObj *ObjPtr; }; void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec) { #pragma omp parallel for for (auto It1= KeyObjPtrVec.begin(); It1!=KeyObjPtrVec.end(); ++It1) for (auto It2= It1+1; It2!=KeyObjPtrVec.end() && It2->Key==It1->Key; ++It2) It1->ObjPtr->Calculate(*It2); }

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec) { std::for_each(std::execution::par_unseq, boost::counting_iterator<std::size_t>(0u), boost::counting_iterator<std::size_t>(KeyObjPtrVec.size()), [&KeyObjPtrVec](auto i) { for (auto j= i+1; j<KeyObjPtrVec.size() && KeyObjPtrVec[j].Key==KeyObjPtrVec[i].Key; ++j) KeyObjPtrVec[i].ObjPtr->Calculate(KeyObjPtrVec[j]); }); }

If I get the algorithm right, we must

iterate over all possible value pairs in the vector of structures, provided KeyObjPtrVec vector,
but without pairing the object with itself
and the order of the pair members is not important
Then, if they Key field of the structure matches, we must call Calculate on the Obj field of our structure.

C++17 has the excellent built-in parallelization support, and this can simply done with the standard libraries:

#include <algorithm>
#include <execution>

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec) {
  std::vector<size_t> indices(KeyObjPtrVec.size());
  for (size_t k = 0; k < indices.size(); k++) {
    indices[k] = k;
  }

  std::for_each(std::execution::par, indices.begin(), indices.end(),
      [KeyObjPtrVec, indices](size_t k) {
        std::for_each(std::execution::par, indices.begin() + k + 1, indices.end(),
            [k, KeyObjPtrVec](size_t l) {
              if (KeyObjPtrVec[k].Key == KeyObjPtrVec[l].Key) {
                KeyObjPtrVec[k].ObjPtr->Calculate(KeyObjPtrVec[l]);
              }
            });
      });
}

If you specify std::execution::par as the first parameter of std::for_each, the parallel version runs. No need to care about executor, thread synchronization or any cleanup - all just works out of box. Just link the library tbb (target_link_libraries(myexecutable tbb) for cmake).

With std::for_each you get access the element you are supposed to process, not the iterator. To get more free access required for setting up the second std::for_each, we create array of indices and iterate over them instead. Having index of the element, we can access the array at any relative position in the two lambdas we have.

This code is tested and confirmed to work as described in the beginning of the answer.

Recommended topics

Hot tags