Choose m elements randomly from a vector containing n elements

Asked 18/2, 2012 at 23:20 Answered 26/7, 2022 at 3:23

I have a vector containing n elements. I need to choose a subset of m elements randomly from the vector without repetition. What is the most efficient way of doing this? I need to do this several thousands of times in my code.

The solution on top of my mind is to use rand() to generate a random number k between 0 and n. Then pick the kth element in the vector and insert it into a std::set. Keep doing this till the set's size becomes equal to m. I am now assured that the set contains m unique elements randomly chosen from the set of n elements.

What are the other possible solutions?

Thanks.

Rigsdaler answered 18/2, 2012 at 23:20 Comment(5)

Do std::random_shuffle() on vector and pull first m elements out of it, perhaps? – Bawbee 18/2, 2012 at 23:25

@jrok: while simple, that is _highly inefficient when m is much smaller than n. – Pursy 19/2, 2012 at 0:52

possible duplicate of Algorithm to select a single, random combination of values? – Bohi 19/2, 2012 at 3:44

@MooingDuck But the fact is that std::random_shuffle() is just a full permutation which also uses Fisher-Yates shuffle. See the first version of possible implementation from cppreference. For those who wants to understand the accepted answer and also std::random_shuffle(), it's worth for a read. – Cloaca 25/10, 2018 at 7:46

@Rick: Sure, it's the same algorithm, but instead of stopping after m elements, it does the entire data set, which is a massive waste of time. Its the difference between std::sort and std::partial_sort. – Pursy 25/10, 2018 at 16:32

You want a Fisher-Yates shuffle (stop after M iterations):

template<class BidiIter >
BidiIter random_unique(BidiIter begin, BidiIter end, size_t num_random) {
    size_t left = std::distance(begin, end);
    while (num_random--) {
        BidiIter r = begin;
        std::advance(r, rand()%left);
        std::swap(*begin, *r);
        ++begin;
        --left;
    }
    return begin;
}

Demo at http://ideone.com/3A3cv. This is significantly faster than std::random_shuffle when you only need a few random numbers out of the set, and should be just about the same speed even if N==M.

Bodhisattva answered 18/2, 2012 at 23:28 Comment(3)

@Burr Thanks! I have a million elements in my vector out of which I need to pick just 100 elements in random. This is exactly what I was looking for. – Rigsdaler 20/2, 2012 at 17:9

rand(): see codereview.stackexchange.com/questions/39001/… – Mattock 20/2, 2016 at 22:40

Ya.. This is a Stop after x selection version of std::random_shuffle.. I am curious why STL not overloading this version. Thanks btw :D. – Cloaca 25/10, 2018 at 7:49

One way you could do this is to create a list of all the indices of the vector, shuffle them, and take the first n to be the indices of the selected objects:

struct rangegenerator {
    rangegenerator(int init) : start(init) { }

    int operator()() {
        return start++;
    }

    int start;
};

vector<T> numbers; // this is filled somewhere else

vector<int> indices(numbers.size());

generate(begin(indices), end(indices), rangegenerator(0));

random_shuffle(begin(indices), end(indices));

// then take the first n elements of indices and use them as indices into numbers

Cynara answered 18/2, 2012 at 23:29 Comment(3)

When m is much smaller than n, this is highly inefficient. It's not hard to come up with an answer that is faster than this for all m (where m is less than n) – Pursy 19/2, 2012 at 0:53

@Seth: Will have to agree with Moo. This is probably one of the worst ways to accomplish the given task - not sure why the OP marked it as an answer. The correct answer is obviously Burr's answer. – Rayfordrayle 19/2, 2012 at 1:55

@JaredKrumsie the OP asked for "other possible solutions" and what I wrote is definitely a possible solution. The only way an answer would be incorrect is if it didn't work at all. – Cynara 19/2, 2012 at 2:3

Since C++20,
We can use std::ranges::sample():

#include <cassert>
#include <iostream>
#include <random>
#include <vector>


void Test() {
  std::mt19937_64 random_engine{std::random_device{}()};
  std::vector<int> in{{1, 2, 3, 4, 5}};
  std::size_t m{2};

  std::vector<int> out{};
  std::ranges::sample(in, std::back_inserter(out), m, random_engine);
  assert(out.size() == m);

  for (const int &elem : out) {
    std::cout << elem << std::endl;
  }
}

Hotel answered 26/7, 2022 at 3:23 Comment(0)

Recommended topics

Hot tags