Choose m elements randomly from a vector containing n elements
Asked Answered
R

3

26

I have a vector containing n elements. I need to choose a subset of m elements randomly from the vector without repetition. What is the most efficient way of doing this? I need to do this several thousands of times in my code.

The solution on top of my mind is to use rand() to generate a random number k between 0 and n. Then pick the kth element in the vector and insert it into a std::set. Keep doing this till the set's size becomes equal to m. I am now assured that the set contains m unique elements randomly chosen from the set of n elements.

What are the other possible solutions?

Thanks.

Rigsdaler answered 18/2, 2012 at 23:20 Comment(5)
Do std::random_shuffle() on vector and pull first m elements out of it, perhaps?Bawbee
@jrok: while simple, that is _highly inefficient when m is much smaller than n.Pursy
possible duplicate of Algorithm to select a single, random combination of values?Bohi
@MooingDuck But the fact is that std::random_shuffle() is just a full permutation which also uses Fisher-Yates shuffle. See the first version of possible implementation from cppreference. For those who wants to understand the accepted answer and also std::random_shuffle(), it's worth for a read.Cloaca
@Rick: Sure, it's the same algorithm, but instead of stopping after m elements, it does the entire data set, which is a massive waste of time. Its the difference between std::sort and std::partial_sort.Pursy
B
43

You want a Fisher-Yates shuffle (stop after M iterations):

template<class BidiIter >
BidiIter random_unique(BidiIter begin, BidiIter end, size_t num_random) {
    size_t left = std::distance(begin, end);
    while (num_random--) {
        BidiIter r = begin;
        std::advance(r, rand()%left);
        std::swap(*begin, *r);
        ++begin;
        --left;
    }
    return begin;
}

Demo at http://ideone.com/3A3cv. This is significantly faster than std::random_shuffle when you only need a few random numbers out of the set, and should be just about the same speed even if N==M.

Bodhisattva answered 18/2, 2012 at 23:28 Comment(3)
@Burr Thanks! I have a million elements in my vector out of which I need to pick just 100 elements in random. This is exactly what I was looking for.Rigsdaler
rand(): see codereview.stackexchange.com/questions/39001/…Mattock
Ya.. This is a Stop after x selection version of std::random_shuffle.. I am curious why STL not overloading this version. Thanks btw :D.Cloaca
C
4

One way you could do this is to create a list of all the indices of the vector, shuffle them, and take the first n to be the indices of the selected objects:

struct rangegenerator {
    rangegenerator(int init) : start(init) { }

    int operator()() {
        return start++;
    }

    int start;
};

vector<T> numbers; // this is filled somewhere else

vector<int> indices(numbers.size());

generate(begin(indices), end(indices), rangegenerator(0));

random_shuffle(begin(indices), end(indices));

// then take the first n elements of indices and use them as indices into numbers
Cynara answered 18/2, 2012 at 23:29 Comment(3)
When m is much smaller than n, this is highly inefficient. It's not hard to come up with an answer that is faster than this for all m (where m is less than n)Pursy
@Seth: Will have to agree with Moo. This is probably one of the worst ways to accomplish the given task - not sure why the OP marked it as an answer. The correct answer is obviously Burr's answer.Rayfordrayle
@JaredKrumsie the OP asked for "other possible solutions" and what I wrote is definitely a possible solution. The only way an answer would be incorrect is if it didn't work at all.Cynara
H
2

Since C++20,
We can use std::ranges::sample():

#include <cassert>
#include <iostream>
#include <random>
#include <vector>


void Test() {
  std::mt19937_64 random_engine{std::random_device{}()};
  std::vector<int> in{{1, 2, 3, 4, 5}};
  std::size_t m{2};

  std::vector<int> out{};
  std::ranges::sample(in, std::back_inserter(out), m, random_engine);
  assert(out.size() == m);

  for (const int &elem : out) {
    std::cout << elem << std::endl;
  }
}
Hotel answered 26/7, 2022 at 3:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.