How to split a vector into n "almost equal" parts
Asked Answered
U

9

21

I have a problem that I would like to merge a large number of images using ImageMagick's convert.exe, but under Windows I have a 8192 byte long command line limit.

My solution to this is to split the task into smaller sub-task, run them, and do a final task which combines them together.

My idea is to write a function, which takes a vector of images and an integer, and splits the vector into n sub-vector all having "almost equal" parts.

So for example if I would like to split 11 into 3 groups it would be 4-4-3.

Can you tell me how can I do it in C++? I mean, to write a function

split_vec( const vector<image> &images, int split )

which does the splitting?

Also, can you tell me what is the most efficient way to do if I don't need to create new vectors, just iterate through the sub-parts? Like the std::substr function with std::string?

Note: I already use Boost in the project, so if there is some nice tool in Boost for this then it's perfect for me.

Ummersen answered 28/7, 2011 at 14:57 Comment(0)
H
14

To get a base number for the size of each part, simply divide the total by the number of parts: 11/3 = 3. Obviously some of the parts will need to be bigger than that to get the proper total, but that's just the remainder: 11 % 3 = 2. So now you know that 2 of the parts will be size 3+1, and whatever's left over will be 3.

Husbandry answered 28/7, 2011 at 15:3 Comment(4)
Thanks, here is what I come up with: double loop = number / parts; for( int i = 0; i < parts; i++ ) { int start = i * loop; int end = ( i + 1 ) * loop - 1; }Ummersen
@zsero, if both number and parts are integers you'll need to convert one to double before doing the division. Also you'll need to worry about roundoff error, there are cases where you might get an off-by-one error when you convert back to integer.Husbandry
Actually I use doubles in the function definition, and a round() function for start and end. Do you think it is possible to have roundoff error when using round() function? (I use stringstream to round)Ummersen
@zsero, if you're using rounding instead of truncation for start and end you should be OK. You left that part off of your previous comment.Husbandry
O
8

Here is my solution:

template<typename T>
std::vector<std::vector<T>> SplitVector(const std::vector<T>& vec, size_t n)
{
    std::vector<std::vector<T>> outVec;

    size_t length = vec.size() / n;
    size_t remain = vec.size() % n;

    size_t begin = 0;
    size_t end = 0;

    for (size_t i = 0; i < std::min(n, vec.size()); ++i)
    {
        end += (remain > 0) ? (length + !!(remain--)) : length;

        outVec.push_back(std::vector<T>(vec.begin() + begin, vec.begin() + end));

        begin = end;
    }

    return outVec;
}
Olivas answered 8/6, 2016 at 16:53 Comment(0)
W
1

CreateProcess has a 32kb limit

Or, if you want to go via the shell,

vec::const_iterator i = vec .begin ();
vec::const_iterator j = i + stride;

while (j < vec .end ()) {
    do_range (i, j);
    i = j;
    j += stride;
}

do_range (i, vec .end ());
Wendelin answered 28/7, 2011 at 15:4 Comment(0)
A
1

Have you thought about using the xargs program. This maybe a high-level solution to the problem.

Apparent answered 28/7, 2011 at 15:11 Comment(3)
I use "unix" utilities on my Windows machines all the time. checkout: unxutils.sf.net and/or www.cygwin.comApparent
Thanks for the tip, although I'm afraid this won't help him run the code on someone else's computer :-PWendelin
Why? xargs is a stand-alone program. Distribute it with his program.Apparent
M
1

You don't have to create new sub-vectors, use something like following:

size_t ProcessSubVec(const vector<Image>& images, size_t begin, size_t end)
{
    // your processing logic
}

void SplitVec(const vector<Image>& images, int cnt)
{
    size_t SubVecLen = images.size() / cnt,
           LeftOvers = images.size() % cnt,
           i = 0;

    // Split into "cnt" partitions
    while(i < images.size())
        i += ProcessSubVec(images, i, i + SubVecLen + (LeftOvers-- == 0 ? 0 : 1));
}

Hope this helps.

Moulding answered 30/7, 2011 at 11:17 Comment(1)
Brandon what should ProcessSubVec return? i didn't understand it.Antananarivo
V
1

You could create a template that returns a std::vector < std::vector > and receives the vector you want split, and the number of divisions. using for and iterator is very easy.

#include <iostream>
#include <iomanip>
#include <vector>
#include <algorithm>
#include <numeric>

template<typename T>
std::vector< std::vector<T> > split(std::vector<T> vec, uint64_t n) {
  std::vector< std::vector<T> > vec_of_vecs(n);

  uint64_t quotient = vec.size() / n;
  uint64_t reminder = vec.size() % n;
  uint64_t first = 0;
  uint64_t last;
  for (uint64_t i = 0; i < n; ++i) {
    if (i < reminder) {
      last = first + quotient + 1;
      vec_of_vecs[i] = std::vector<T>(vec.begin() + first, vec.begin() + last);
      first = last;
  }
    else if (i != n - 1) {
    last = first +  quotient;
    vec_of_vecs[i] = std::vector<T>(vec.begin() + first, vec.begin() + last);
    first = last;
  }
    else
    vec_of_vecs[i] = std::vector<T>(vec.begin() + first, vec.end());
}

return vec_of_vecs;
}

#define ONE_DIMENSION 11
#define SPLITS 3

int main(void)
{
  std::vector<uint64_t> vector(ONE_DIMENSION);
  std::iota(std::begin(vector), std::end(vector), 1);

  std::vector<std::vector<uint64_t>> vecs(SPLITS);
  vecs = split(vector, SPLITS);

  for (uint64_t m = 0; m < vecs.size(); ++m) {
    for (auto i : vecs[m])
      std::cout << std::setw(3) << i << " ";
    std::cout << std::endl;
  }


  return 0;
}
Vidovik answered 12/2, 2018 at 5:24 Comment(0)
Z
1

This is how I do it (I know it's very similar to the answer but that was my actual code lol):

template<typename T>
std::vector<std::vector<T>> splitVector(const std::vector<T>& vec, size_t n)
{
    std::vector<std::vector<T>> out_vec;
    size_t length = vec.size() / n;
    size_t remain = vec.size() % n;
    size_t begin = 0;
    size_t end = 0;

    for (size_t i = 0; i < n; ++i)
    {
        end += length + (i < remain);
        out_vec.emplace_back(vec.begin() + begin, vec.begin() + end);
        begin = end;
    }

    return out_vec;
}

You could also return pair of iterators or such if you don't like copying.

Zsa answered 17/6, 2022 at 19:9 Comment(0)
M
0

You can use iterators to iterate through the sub-parts of the problem. Iterators usage is similar to pointers to elements of the vector

What you want to on the images do could be implemented as a function

using namespace std; 
void do_some_work(vector<image>::iterator begin, vector<image>::iterator end) {
    vector<image>::iterator i = begin ;
    while(i != end) {
        // do something using *i , which will be of type image
        ++i ;
    }
}
Molasses answered 28/7, 2011 at 15:5 Comment(0)
W
0

If someone else is still looking for another solution, you can use boost::iterator_range

template <typename Container>
auto split(const Container& in, std::size_t parts)
{
   using Range = boost::iterator_range<decltype(std::begin(in))>;

   auto size = std::distance(std::begin(in), std::end(in));
   if (parts == 0 || size < parts)
   {
      throw std::runtime_error(boost::str(boost::format("Cannot split %1% element container into %2% parts!") % size % parts));
   }

   const auto min_part_size = size / parts;
   auto begin = std::begin(in);

   std::vector<Range> out(parts);
   std::for_each(out.begin(), out.end() - 1, [&](auto& out_range)
   {
      out_range = boost::make_iterator_range_n(begin, min_part_size);
      begin = std::end(out_range);
   });
   out.back() = boost::make_iterator_range(begin, std::end(in));
   return out;
}

After that, you can easily convert each iterator range chunk into your preferred type.

Woodward answered 30/5, 2024 at 12:41 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.