How to select a random element in std::set?
Asked Answered
B

7

40

How can I select a random element in an std::set?

I naively tried this:

int GetSample(const std::set<int>& s) {
  double r = rand() % s.size();
  return *(s.begin() + r); // compile error
}

But the operator+ is not allowed in this way.

Baxter answered 16/6, 2010 at 11:26 Comment(2)
Be carefull of using modulus (%) in random number generation the distribution may not be exactly even (last element is less likely than the others).Lemire
Modulo bias is something you should consider when s.size() is large compared with RAND_MAXAgace
A
59

You could use the std::advance method.

#include <set>
#include <algorithm>

int main() {
  using namespace std;
  // generate a set...
  set<int> s;
  for( int i = 0; i != 10; ++i ) s.insert(i);
  auto r = rand() % s.size(); // not _really_ random
  auto n = *select_random(s, r);
}

Where

template<typename S>
auto select_random(const S &s, size_t n) {
  auto it = std::begin(s);
  // 'advance' the iterator n times
  std::advance(it,n);
  return it;
}
Attend answered 16/6, 2010 at 11:27 Comment(10)
Oh, I forgot about that method. Thanks, that's exactly what I need.Baxter
Any solution will be O(N). Proof is left as an exercise, hint: how many elements of a std::set can be reached in constant time?Phycomycete
Could be O(logN). std::set is stored in some kind of tree, there could potentially be a solution that just goes down on one of the branches, and is done.Lavonnelaw
The method in my answer using a sorted vector is O(1).Katherine
@Katherine True, but in that case insertion and removal are O(N). That seems like cheating, but it depends on the use-case, of course. When you fill the vector once, that's fine.Invulnerable
@Jonas: sure, but the question here was only about random selection ... so I didn't care about the others :-)Katherine
@Lavonnelaw You're right that with balanced search trees you can have O(log(N)) for insertion, removal, and random access. However, the latter requires that nodes store how many children they have to either their left or right. This needs to be updated during insertion, removal, and rebalancing. Since std::set and std::map hide the tree-internals from the user, they cannot be used to achieve this. I ended up implementing my own search tree. It's definitely possible to get O(log(N)) lookup.Invulnerable
this answer would be much better if it answered the question "how can I select a random element from a set", currently as far as I can tell it shows neither randomness nor element selection, though it gives you the tools to piece it togetherKingcraft
@Timofey that's right. The 'problem' OP had was not the question he asked :). Updated my answer accordingly.Attend
@Attend it was a good answer, but it's that much better if the whole solution is given! really appreciate it, thank youKingcraft
Z
3

First Solution : O(log n) in time / O(1) in space (not uniform !)

A hypothesized in a comment above, it can be done in O(log(n)) (vs O(n) for std::advance) without a vector (using O(n) more space) by using the method I describe here.

Essentially, you :

  • check if the set is empty (if it is, there is no hope)
  • generate a random value
  • if already there return it else insert it
  • get one iterator it on it
  • get the random element as *(it++) or *(set.begin()) if it at the end
  • return it not before deleting the element you inserted

n.b : As pointed out by Aaron the element is not chosen uniformly at random. You need to build the random element with the same distribution than the elements in the set to approach a uniform polling.

Second Solution : O(1) in time / O(n) in space (uniform)

davidhigh already gave the solution with a vector but there is a problem because when you pop an element of your stack, you will have to perform a linear search in O(n) or you can rebuild your vector each time you want to retrieve a random element but that is O(n) too.

To avoid this problem and keep the insert/delete to O(log n), you can keep an std::unordered_set and use a similar method to the first solution to get a random element in O(1).

p.s : If your elements are large you can use an unordered set of pointers (with a modified hasher) to spare some memory.

Ziguard answered 20/7, 2015 at 17:48 Comment(2)
That is random yes, but it is not uniformly at random from the current elements of the set. And we can assume the questioner wants uniformity. Although maybe this is not entirely necessaryBrowder
Indeed though if you generate your element with a distribution that look like the set that would approach it. We don't have this issue with the unordered_set (see the link in the answer). Need to think about it...Ziguard
E
3

C++17 std::sample

This will be a convenient, although not very efficient (O(n)) method:

#include <algorithm>
#include <iostream>
#include <random>
#include <set>
#include <vector>

int main() {
    std::set<int> in{1, 2, 3, 5, 7};
    std::vector<int> out;
    std::sample(in.begin(), in.end(), std::back_inserter(out),
                3, std::mt19937{std::random_device{}()});
    for (auto i : out)
        std::cout << i << std::endl;
}

But I think that for efficiency you just need to copy to another type of structure: How to select a random element in std::set in less than O(n) time?

Exocrine answered 27/2, 2017 at 10:56 Comment(0)
K
2

If the random access is important and you can live with O(N) average effort for the insertion, then the workaround given in this paper might be convenient.

The main idea there is to use a sorted vector, and then for lookup the function std::lower_bound. This, the lookup takes O(log N) just as in a normal set. Further, (random) insertion takes O(N), as all following elements must be shifted just like in a normal vector (and possibly a reallocation is performed). Insertion at the back, however, is constant (except for the reallocation. You can avoid this by calling reserve() with a large enough storage).

Finally, the main point of the question: Random access is O(1). Just draw a random number i from a uniform distribution in [0, V.size()-1], and return the corresponding element V[i].

Here is the code basis out of the paper, which implements this sorted vector. Extend it as needed:

template <class T, class Compare = std::less<T> >
struct sorted_vector {
 using std::vector;
 using std::lower_bound;
 vector<T> V;
 Compare cmp; 
 typedef typename vector<T>::iterator iterator;
 typedef typename vector<T>::const_iterator const_iterator;
 iterator begin() { return V.begin(); }
 iterator end() { return V.end(); }
 const_iterator begin() const { return V.begin(); }
 const_iterator end() const { return V.end(); }

 //...if needed, implement more by yourself

 sorted_vector(const Compare& c = Compare()) : V(), cmp(c) {}
 template <class InputIterator>
 sorted_vector(InputIterator first, InputIterator last, Const Compare& c = Compare())
 : V(first, last), cmp(c)
 {
 std::sort(begin(), end(), cmp);
 }

 //...

 iterator insert(const T& t) {
     iterator i = lower_bound(begin(), end(), t, cmp);
     if (i == end() || cmp(t, *i))
        V.insert(i, t);
      return i;
 }
 const_iterator find(const T& t) const {
     const_iterator i = lower_bound(begin(), end(), t, cmp);
      return i == end() || cmp(t, *i) ? end() : i;
 }
};

For a more sophisticated implementation, you might also consider this page.

EDIT: or even better, use boost::container::flat_set, which implements the set using the idea above, i.e. as a sorted vector.

Katherine answered 2/7, 2014 at 14:19 Comment(1)
If you know the set is not going to change after you start taking random samples, or it changes very infrequently, you could also cache it in a vector when it changes and just pick from there. You could wrap that cached set up any way you please to make it transparent (writes invalidate cache, cache rebuilt if invalid on read).Precinct
R
2

To get a random element from a set first take a random number using rand() function then take a modulas (%) by set size so that our iterator will not go out of bounds. Now, to get random element just iterate idx=rand() % s.size() times to get random element. In this method each element has same probability of occurring.

// making set
unordered_set<int> s;
s.insert(1);
s.insert(2);
s.insert(3);
s.insert(4);

// logic
int idx = rand()%s.size();
auto it = s.begin();
for (int i = 0; i < idx; i++)
{
    it++;
}
return *it;
Reverberatory answered 6/6, 2020 at 6:50 Comment(0)
C
1
int GetSample(const std::set<int>& s) {
  double r = rand() % s.size();
  std::set<int>::iterator it = s.begin();
  for (; r != 0; r--) it++;
  return *it;
}

would be one way of doing it, although not pretty;

Capping answered 16/6, 2010 at 11:29 Comment(1)
This code is incorrect, you cannot simply check double for equality. And why double here?Negligence
R
1
 **You can simply pass random parameter as second parameter to get random iterator and 
 after that you can easily get element corresponding to it**
 #include <iostream>
 #include <unordered_set>
 #include <cstdlib>

 int main() {
 std::unordered_set<int> mySet;

 //Insert 10 elements into the set
 for (int i = 1; i <= 10; ++i) {
    mySet.insert(i);
 }

// Display the set
std::cout << "Set elements: ";
for (int element : mySet) {
    std::cout << element << " ";
}
std::cout << std::endl;

// Get a random element
int randomElement = *(std::next(mySet.begin(), rand() % mySet.size()));

std::cout << "Random Element: " << randomElement << std::endl;


return 0;
}
Recommend answered 19/12, 2023 at 12:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.