Parallelizing a simple loop with c++17 algorithms
Asked Answered
K

1

8

I have a parallel code that can be reduced to basically:

#include <algorithm>
#include <vector>

struct TKeyObjPtr;

class TObj
{
public:
  virtual void Calculate(TKeyObjPtr const &) = 0;
};

struct TKeyObjPtr
{
  int Key;
  TObj *ObjPtr;
};

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec)
{
  #pragma omp parallel for
  for (auto It1= KeyObjPtrVec.begin(); It1!=KeyObjPtrVec.end(); ++It1)
    for (auto It2= It1+1; It2!=KeyObjPtrVec.end() && It2->Key==It1->Key; ++It2)
      It1->ObjPtr->Calculate(*It2);
}

I would like to modernize that code by using the parallel algorithms. Unfortunately, I'm having trouble in rewriting such a simple piece of code.

An option would be using boost::counting_iterator:

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec)
{
  std::for_each(std::execution::par_unseq,
    boost::counting_iterator<std::size_t>(0u),
    boost::counting_iterator<std::size_t>(KeyObjPtrVec.size()),
    [&KeyObjPtrVec](auto i)
      {
        for (auto j= i+1; j<KeyObjPtrVec.size() && KeyObjPtrVec[j].Key==KeyObjPtrVec[i].Key; ++j)
          KeyObjPtrVec[i].ObjPtr->Calculate(KeyObjPtrVec[j]);
      });
}

This works but is considerably more verbose and, worse, I don't think it is compliant with the standard because boost::counting_iterator is a stashing iterator and, therefore, does not meet the Cpp17ForwardIterator requirements.

Is it possible to write the above code as concisely as with OpenMP, while satisfying the constraints of the standard on parallel algorithms?

Keary answered 4/2, 2019 at 18:16 Comment(9)
are you sure that counting_iterator is no ForwardIterator? Afaik ForwardIterator is just the minimum that is needed to make for_each work, not moreAborigine
@user463035818 The problem is that ForwardIterators are required to return a reference to some object. counting_iterator returns by value.Keary
@Keary That is not what I see on the documentation you linked. I see the reference typedef as being const Incrementable&, and operator* does return referenceVirgel
I don't think this loop qualifies as "simple". The iterator pairings in It1->ObjPtr->Calculate(*It2); stem from consecutive Key values that compare equal, but only in the container part that It1 hasn't passed yet, plus object pairings behind the iterators will be used multiple times for more than two equal consecutive keys.Amador
@Virgel Exactly ForwardIterators must return a reference and counting_iterator does not. See also point 6 "If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object"Keary
@user463035818 eel.is/c++draft/forward.iterators#1.3Virgel
@Virgel ups, my standardese is really crappy, but simply not reading it is not a good excuse ;)Aborigine
@Amador KeyObjPtrVec is sorted by Key. The loop simply tests pairs of KeyObjPtr with equal Key. I don't think is complicated at all.Keary
The fact that the vector is sorted indeed makes the loop easier to grasp. Anyway, good for you if you get what the loop is doing at first glance ;)Amador
S
0

If I get the algorithm right, we must

  • iterate over all possible value pairs in the vector of structures, provided KeyObjPtrVec vector,
  • but without pairing the object with itself
  • and the order of the pair members is not important
  • Then, if they Key field of the structure matches, we must call Calculate on the Obj field of our structure.

C++17 has the excellent built-in parallelization support, and this can simply done with the standard libraries:

#include <algorithm>
#include <execution>

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec) {
  std::vector<size_t> indices(KeyObjPtrVec.size());
  for (size_t k = 0; k < indices.size(); k++) {
    indices[k] = k;
  }

  std::for_each(std::execution::par, indices.begin(), indices.end(),
      [KeyObjPtrVec, indices](size_t k) {
        std::for_each(std::execution::par, indices.begin() + k + 1, indices.end(),
            [k, KeyObjPtrVec](size_t l) {
              if (KeyObjPtrVec[k].Key == KeyObjPtrVec[l].Key) {
                KeyObjPtrVec[k].ObjPtr->Calculate(KeyObjPtrVec[l]);
              }
            });
      });
}

If you specify std::execution::par as the first parameter of std::for_each, the parallel version runs. No need to care about executor, thread synchronization or any cleanup - all just works out of box. Just link the library tbb (target_link_libraries(myexecutable tbb) for cmake).

With std::for_each you get access the element you are supposed to process, not the iterator. To get more free access required for setting up the second std::for_each, we create array of indices and iterate over them instead. Having index of the element, we can access the array at any relative position in the two lambdas we have.

This code is tested and confirmed to work as described in the beginning of the answer.

Schlep answered 1/10, 2023 at 18:59 Comment(1)
Thanks for your answer. Yes, your solution is C++17-compliant. However, it involves the materialization of the indices into a vector, which should not be necessary (also note that capturing KeyObjPtrVec and indices by value involves copying). Fortunately, the Cpp17ForwardIterator requirements for the parallel algorithms has been relaxed in C++23 by the adoption of P2408R5, so that we will be able to use std::ranges::iota_view once P2408R5 is implemented in the standard library implementations.Keary

© 2022 - 2025 — McMap. All rights reserved.