Picking a random element from a set
Asked Answered
M

34

216

How do I pick a random element from a set? I'm particularly interested in picking a random element from a HashSet or a LinkedHashSet, in Java.

Merriemerrielle answered 24/9, 2008 at 0:12 Comment(2)
You should specify some conditions to see if this is really what you want. - How may times are you going to be selecting a random element? - Does the data need to be stored in a HashSet or LinkedHashSet, neither are not randomly accessable. - Is the hash set large? Are the keys small?Icily
"Solutions for other languages are also welcome" makes this too broad. I've removed it--it's devolving into an unsearchable mess of random snippets that aren't useful for future visitors. Each language should have its own thread for selecting a random item from a set. OP hasn't visited the site in 14 years at the time of writing, and the question is rather legacy, so I think this is a reasonable edit, even if it invalidated some existing answers.Corry
W
102
int size = myHashSet.size();
int item = new Random().nextInt(size); // In real life, the Random object should be rather more shared than this
int i = 0;
for(Object obj : myhashSet)
{
    if (i == item)
        return obj;
    i++;
}
Wyattwyche answered 24/9, 2008 at 0:17 Comment(8)
If myHashSet is large, then this will be a rather slow solution since on average, (n / 2) iterations will be needed to find the random object.Mannerheim
if your data is in a hash set, you need O(n) time. There's no way around it if you are just picking a single element and the data is stored in a HashSet.Icily
@David Nehme: This is a drawback in the specification of HashSet in Java. In C++, it's typical to be able to directly access the buckets that make up the hashset, which allows us to more efficiently select a random elements. If random elements are necessary in Java, it might be worthwhile to define a custom hash set that allows the user to look under the hood. See [boost's docs][1] for a little more in this. [1] boost.org/doc/libs/1_43_0/doc/html/unordered/buckets.htmlDia
If the set is not mutated over multiple accesses, you can copy it into an array and then access O(1). Just use myHashSet.toArray()Citystate
@AaronMcDaid Even then, you would have to take empty buckets into account.Shanaeshanahan
@Citystate wouldn't this make matters worse, since the set would have to be copied to a new array? docs.oracle.com/javase/7/docs/api/java/util/… "this method must allocate a new array even if this collection is backed by an array"Curium
@Curium See discussion under Dan Dyer's answer https://mcmap.net/q/75991/-picking-a-random-element-from-a-setCitystate
This solution will work only for Integer type elements, i.e. HashSet<Integer>. For other types see another answer (down below): https://mcmap.net/q/75991/-picking-a-random-element-from-a-setFaso
C
77

A somewhat related Did You Know:

There are useful methods in java.util.Collections for shuffling whole collections: Collections.shuffle(List<?>) and Collections.shuffle(List<?> list, Random rnd).

Condemnatory answered 24/9, 2008 at 0:43 Comment(4)
Awesome! This is not crossreferenced anywhere in the java doc! Like Python's random.shuffle()Steroid
But this only works with Lists, i.e., structures that have a .get() function.Experimentalism
@bourbaki4481472 is absolutely correct. This only works for those collections extending the List interface, not the Set interface discussed by the OP.Coitus
However, OP wishes to pick a (one I assume) element. Shuffling the whole list (and also transferring all elements in the set to a list) is very expensive for that one item... If you have a list, you use List.get(new Random().nextInt(size)).Intersidereal
E
46

In Java 8:

static <E> E getRandomSetElement(Set<E> set) {
    return set.stream().skip(new Random().nextInt(set.size())).findFirst().orElse(null);
}
Erupt answered 19/7, 2018 at 1:15 Comment(4)
Why not findFirst().get() instead of orElse()?Antibiosis
If get() is called without a check, it might throw a NoSuchElementException. This could be ignored if it is granted, that the underlying set of elements is not changed in a concurrent operation (Streams are not automatically atomic).Pugilist
Don't use set.size(), but set.size() -1.Blockbuster
@Olivier Faucheux, the max element in Random().nextInt() is not included. So set.size() should be Ok.Faso
R
41

Fast solution for Java using an ArrayList and a HashMap: [element -> index].

Motivation: I needed a set of items with RandomAccess properties, especially to pick a random item from the set (see pollRandom method). Random navigation in a binary tree is not accurate: trees are not perfectly balanced, which would not lead to a uniform distribution.

public class RandomSet<E> extends AbstractSet<E> {

    List<E> dta = new ArrayList<E>();
    Map<E, Integer> idx = new HashMap<E, Integer>();

    public RandomSet() {
    }

    public RandomSet(Collection<E> items) {
        for (E item : items) {
            idx.put(item, dta.size());
            dta.add(item);
        }
    }

    @Override
    public boolean add(E item) {
        if (idx.containsKey(item)) {
            return false;
        }
        idx.put(item, dta.size());
        dta.add(item);
        return true;
    }

    /**
     * Override element at position <code>id</code> with last element.
     * @param id
     */
    public E removeAt(int id) {
        if (id >= dta.size()) {
            return null;
        }
        E res = dta.get(id);
        idx.remove(res);
        E last = dta.remove(dta.size() - 1);
        // skip filling the hole if last is removed
        if (id < dta.size()) {
            idx.put(last, id);
            dta.set(id, last);
        }
        return res;
    }

    @Override
    public boolean remove(Object item) {
        @SuppressWarnings(value = "element-type-mismatch")
        Integer id = idx.get(item);
        if (id == null) {
            return false;
        }
        removeAt(id);
        return true;
    }

    public E get(int i) {
        return dta.get(i);
    }

    public E pollRandom(Random rnd) {
        if (dta.isEmpty()) {
            return null;
        }
        int id = rnd.nextInt(dta.size());
        return removeAt(id);
    }

    @Override
    public int size() {
        return dta.size();
    }

    @Override
    public Iterator<E> iterator() {
        return dta.iterator();
    }
}
Roughage answered 14/4, 2011 at 20:6 Comment(5)
Well that would work but the question was about the Set interface. This solution forces users to have concrete type references of the RandomSet.Eatmon
I really like this solution, but it's not thread safe, inaccuracies between the Map and the List may occur, so I would add some synchronized blocksCoranto
@KonstantinosChalkias the built-in collections aren't thread safe either. Only the ones with the name Concurrent are really safe, the ones wrapped with Collections.synchronized() are semi-safe. Also the OP didn't say anything about concurrency so this is a valid, and good answer.Milwaukee
The iterator returned here should not be able to remove elements from dta (this can be achieved via guava's Iterators.unmodifiableIterator for example). Otherwise the default implementations of e.g. removeAll and retainAll in AbstractSet and its parents working with that iterator will mess up your RandomSet!Ellon
Nice solution. You actually can use a tree if each node contains the number of nodes in the subtree it roots. Then compute a random real in 0..1 and make a weighted 3-way decision (select current node or descend into left or right subtree) at each node based on the node counts. But imo your solution is much nicer.Overburdensome
W
35

This is faster than the for-each loop in the accepted answer:

int index = rand.nextInt(set.size());
Iterator<Object> iter = set.iterator();
for (int i = 0; i < index; i++) {
    iter.next();
}
return iter.next();

The for-each construct calls Iterator.hasNext() on every loop, but since index < set.size(), that check is unnecessary overhead. I saw a 10-20% boost in speed, but YMMV. (Also, this compiles without having to add an extra return statement.)

Note that this code (and most other answers) can be applied to any Collection, not just Set. In generic method form:

public static <E> E choice(Collection<? extends E> coll, Random rand) {
    if (coll.size() == 0) {
        return null; // or throw IAE, if you prefer
    }

    int index = rand.nextInt(coll.size());
    if (coll instanceof List) { // optimization
        return ((List<? extends E>) coll).get(index);
    } else {
        Iterator<? extends E> iter = coll.iterator();
        for (int i = 0; i < index; i++) {
            iter.next();
        }
        return iter.next();
    }
}
Wertheimer answered 20/8, 2014 at 17:3 Comment(0)
R
16

If you want to do it in Java, you should consider copying the elements into some kind of random-access collection (such as an ArrayList). Because, unless your set is small, accessing the selected element will be expensive (O(n) instead of O(1)). [ed: list copy is also O(n)]

Alternatively, you could look for another Set implementation that more closely matches your requirements. The ListOrderedSet from Commons Collections looks promising.

Remembrancer answered 24/9, 2008 at 19:38 Comment(3)
Copying to a list will cost O(n) in time and also use O(n) memory, so why would that be a better choice than fetching from the map directly?Transference
It depends on how many times you want to pick from the set. The copy is a one time operation and then you can pick from the set as many times as you need. If you're only picking one element, then yes the copy doesn't make things any faster.Remembrancer
It's only a one time operation if you want to be able to pick with repetition. If you want the chosen item to be removed from the set, then you would be back to O(n).Cohere
O
9

In Java:

Set<Integer> set = new LinkedHashSet<Integer>(3);
set.add(1);
set.add(2);
set.add(3);

Random rand = new Random(System.currentTimeMillis());
int[] setArray = (int[]) set.toArray();
for (int i = 0; i < 10; ++i) {
    System.out.println(setArray[rand.nextInt(set.size())]);
}
Olney answered 24/9, 2008 at 0:16 Comment(2)
Your answer works, but it's not very efficient because of the set.toArray( ) part.Merriemerrielle
you should move the toArray to outside the loop.Icily
P
7
List asList = new ArrayList(mySet);
Collections.shuffle(asList);
return asList.get(0);
Plank answered 4/12, 2012 at 22:18 Comment(2)
This is abysmally inefficient. Your ArrayList constructor calls .toArray() on the supplied set. ToArray (in most if not all standard collection implementations) iterates over the entire collection, filling an array as it goes. Then you shuffle the list, which swaps each element with a random element. You'd be much better off simply iterating over the set to a random element.Pyongyang
This is O(n log n), even worse than O(n) in most other answers.Corry
C
6

This is identical to accepted answer (Khoth), but with the unnecessary size and i variables removed.

    int random = new Random().nextInt(myhashSet.size());
    for(Object obj : myhashSet) {
        if (random-- == 0) {
            return obj;
        }
    }

Though doing away with the two aforementioned variables, the above solution still remains random because we are relying upon random (starting at a randomly selected index) to decrement itself toward 0 over each iteration.

Circumscribe answered 8/12, 2014 at 6:53 Comment(1)
Third line could also be if (--random < 0) {, where random reaches -1.Earreach
P
5

Java 8+ Stream:

    static <E> Optional<E> getRandomElement(Collection<E> collection) {
        return collection
                .stream()
                .skip(ThreadLocalRandom.current()
                .nextInt(collection.size()))
                .findAny();
    }

Based on the answer of Joshua Bone but with slight changes:

  • Ignores the Streams element order for a slight performance increase in parallel operations
  • Uses the current thread's ThreadLocalRandom
  • Accepts any Collection type as input
  • Returns the provided Optional instead of null
Pugilist answered 3/8, 2021 at 17:3 Comment(0)
B
3

Clojure solution:

(defn pick-random [set] (let [sq (seq set)] (nth sq (rand-int (count sq)))))
Brooklime answered 23/12, 2008 at 3:51 Comment(2)
This solution is also linear, because to get the nth element you must traverse the seq as well.Carlie
It's linear also as it fits nicely in one line :DDirection
C
3

With Guava we can do a little better than Khoth's answer:

public static E random(Set<E> set) {
  int index = random.nextInt(set.size();
  if (set instanceof ImmutableSet) {
    // ImmutableSet.asList() is O(1), as is .get() on the returned list
    return set.asList().get(index);
  }
  return Iterables.get(set, index);
}
Cottontail answered 28/4, 2016 at 15:22 Comment(0)
F
2

Solution above speak in terms of latency but doesn't guarantee equal probability of each index being selected.
If that needs to be considered, try reservoir sampling. http://en.wikipedia.org/wiki/Reservoir_sampling.
Collections.shuffle() (as suggested by few) uses one such algorithm.

Forseti answered 8/12, 2014 at 7:3 Comment(0)
I
1

Can't you just get the size/length of the set/array, generate a random number between 0 and the size/length, then call the element whose index matches that number? HashSet has a .size() method, I'm pretty sure.

In psuedocode -

function randFromSet(target){
 var targetLength:uint = target.length()
 var randomIndex:uint = random(0,targetLength);
 return target[randomIndex];
}
Impatience answered 24/9, 2008 at 0:18 Comment(1)
This only works if the container in question supports random index lookup. Many container implementations don't (e.g., hash tables, binary trees, linked lists).Goulet
C
1

Perl 5

@hash_keys = (keys %hash);
$rand = int(rand(@hash_keys));
print $hash{$hash_keys[$rand]};

Here is one way to do it.

Copycat answered 24/9, 2008 at 0:51 Comment(0)
D
1

C++. This should be reasonably quick, as it doesn't require iterating over the whole set, or sorting it. This should work out of the box with most modern compilers, assuming they support tr1. If not, you may need to use Boost.

The Boost docs are helpful here to explain this, even if you don't use Boost.

The trick is to make use of the fact that the data has been divided into buckets, and to quickly identify a randomly chosen bucket (with the appropriate probability).

//#include <boost/unordered_set.hpp>  
//using namespace boost;
#include <tr1/unordered_set>
using namespace std::tr1;
#include <iostream>
#include <stdlib.h>
#include <assert.h>
using namespace std;

int main() {
  unordered_set<int> u;
  u.max_load_factor(40);
  for (int i=0; i<40; i++) {
    u.insert(i);
    cout << ' ' << i;
  }
  cout << endl;
  cout << "Number of buckets: " << u.bucket_count() << endl;

  for(size_t b=0; b<u.bucket_count(); b++)
    cout << "Bucket " << b << " has " << u.bucket_size(b) << " elements. " << endl;

  for(size_t i=0; i<20; i++) {
    size_t x = rand() % u.size();
    cout << "we'll quickly get the " << x << "th item in the unordered set. ";
    size_t b;
    for(b=0; b<u.bucket_count(); b++) {
      if(x < u.bucket_size(b)) {
        break;
      } else
        x -= u.bucket_size(b);
    }
    cout << "it'll be in the " << b << "th bucket at offset " << x << ". ";
    unordered_set<int>::const_local_iterator l = u.begin(b);
    while(x>0) {
      l++;
      assert(l!=u.end(b));
      x--;
    }
    cout << "random item is " << *l << ". ";
    cout << endl;
  }
}
Dia answered 20/7, 2010 at 13:45 Comment(0)
K
1

How about just

public static <A> A getRandomElement(Collection<A> c, Random r) {
  return new ArrayList<A>(c).get(r.nextInt(c.size()));
}
Kidderminster answered 18/12, 2012 at 21:6 Comment(0)
D
1

For fun I wrote a RandomHashSet based on rejection sampling. It's a bit hacky, since HashMap doesn't let us access it's table directly, but it should work just fine.

It doesn't use any extra memory, and lookup time is O(1) amortized. (Because java HashTable is dense).

class RandomHashSet<V> extends AbstractSet<V> {
    private Map<Object,V> map = new HashMap<>();
    public boolean add(V v) {
        return map.put(new WrapKey<V>(v),v) == null;
    }
    @Override
    public Iterator<V> iterator() {
        return new Iterator<V>() {
            RandKey key = new RandKey();
            @Override public boolean hasNext() {
                return true;
            }
            @Override public V next() {
                while (true) {
                    key.next();
                    V v = map.get(key);
                    if (v != null)
                        return v;
                }
            }
            @Override public void remove() {
                throw new NotImplementedException();
            }
        };
    }
    @Override
    public int size() {
        return map.size();
    }
    static class WrapKey<V> {
        private V v;
        WrapKey(V v) {
            this.v = v;
        }
        @Override public int hashCode() {
            return v.hashCode();
        }
        @Override public boolean equals(Object o) {
            if (o instanceof RandKey)
                return true;
            return v.equals(o);
        }
    }
    static class RandKey {
        private Random rand = new Random();
        int key = rand.nextInt();
        public void next() {
            key = rand.nextInt();
        }
        @Override public int hashCode() {
            return key;
        }
        @Override public boolean equals(Object o) {
            return true;
        }
    }
}
Dunigan answered 22/12, 2013 at 14:5 Comment(1)
Actually, coming back to it, I guess this isn't quite uniform, if the hashmap has many collisions and we do many queries. That is because the java hashmap uses buckets/chaining and this code will always return the first element in the particular bucket. We're still uniform over the randomness of the hash function though.Dunigan
D
1

If you really just want to pick "any" object from the Set, without any guarantees on the randomness, the easiest is taking the first returned by the iterator.

    Set<Integer> s = ...
    Iterator<Integer> it = s.iterator();
    if(it.hasNext()){
        Integer i = it.next();
        // i is a "random" object from set
    }
Dunno answered 4/2, 2015 at 15:50 Comment(1)
This will not be a random pick though. Imagine performing the same operation over the same set multiple times. I think the order will be the same.Burnsides
E
1

The easiest with Java 8 is:

outbound.stream().skip(n % outbound.size()).findFirst().get()

where n is a random integer. Of course it is of less performance than that with the for(elem: Col)

Erleena answered 24/2, 2015 at 20:20 Comment(0)
N
0

Since you said "Solutions for other languages are also welcome", here's the version for Python:

>>> import random
>>> random.choice([1,2,3,4,5,6])
3
>>> random.choice([1,2,3,4,5,6])
4
Nichellenichol answered 24/9, 2008 at 0:15 Comment(3)
Only, [1,2,3,4,5,6] is not a set, but a list, since it doesn't support things like fast lookups.Dunigan
You can still do: >>> random.choice(list(set(range(5)))) >>> 4 Not ideal but it'll do if you absolutely need to.Etem
One thread with solutions in every conceivable language is unmaintainable and unsearchable. I've removed that sentence from the question, because it makes the question off topic. random.choice from set? is the correct place for this.Corry
S
0

PHP, assuming "set" is an array:

$foo = array("alpha", "bravo", "charlie");
$index = array_rand($foo);
$val = $foo[$index];

The Mersenne Twister functions are better but there's no MT equivalent of array_rand in PHP.

Survivor answered 24/9, 2008 at 0:19 Comment(1)
Most set implementations dont have a get(i) or indexing operator, so id assume thats why OP specified its a setAkel
A
0

Javascript solution ;)

function choose (set) {
    return set[Math.floor(Math.random() * set.length)];
}

var set  = [1, 2, 3, 4], rand = choose (set);

Or alternatively:

Array.prototype.choose = function () {
    return this[Math.floor(Math.random() * this.length)];
};

[1, 2, 3, 4].choose();
Arctic answered 24/9, 2008 at 0:35 Comment(0)
L
0

Icon has a set type and a random-element operator, unary "?", so the expression

? set( [1, 2, 3, 4, 5] )

will produce a random number between 1 and 5.

The random seed is initialized to 0 when a program is run, so to produce different results on each run use randomize()

Luthanen answered 24/9, 2008 at 1:46 Comment(0)
C
0

In C#

        Random random = new Random((int)DateTime.Now.Ticks);

        OrderedDictionary od = new OrderedDictionary();

        od.Add("abc", 1);
        od.Add("def", 2);
        od.Add("ghi", 3);
        od.Add("jkl", 4);


        int randomIndex = random.Next(od.Count);

        Console.WriteLine(od[randomIndex]);

        // Can access via index or key value:
        Console.WriteLine(od[1]);
        Console.WriteLine(od["def"]);
Crittenden answered 24/9, 2008 at 3:32 Comment(0)
T
0

In lisp

(defun pick-random (set)
       (nth (random (length set)) set))
Taxidermy answered 24/9, 2008 at 4:28 Comment(1)
This only works for lists, right? With ELT it could work for any sequence.Whodunit
D
0

Unfortunately, this cannot be done efficiently (better than O(n)) in any of the Standard Library set containers.

This is odd, since it is very easy to add a randomized pick function to hash sets as well as binary sets. In a not to sparse hash set, you can try random entries, until you get a hit. For a binary tree, you can choose randomly between the left or right subtree, with a maximum of O(log2) steps. I've implemented a demo of the later below:

import random

class Node:
    def __init__(self, object):
        self.object = object
        self.value = hash(object)
        self.size = 1
        self.a = self.b = None

class RandomSet:
    def __init__(self):
        self.top = None

    def add(self, object):
        """ Add any hashable object to the set.
            Notice: In this simple implementation you shouldn't add two
                    identical items. """
        new = Node(object)
        if not self.top: self.top = new
        else: self._recursiveAdd(self.top, new)
    def _recursiveAdd(self, top, new):
        top.size += 1
        if new.value < top.value:
            if not top.a: top.a = new
            else: self._recursiveAdd(top.a, new)
        else:
            if not top.b: top.b = new
            else: self._recursiveAdd(top.b, new)

    def pickRandom(self):
        """ Pick a random item in O(log2) time.
            Does a maximum of O(log2) calls to random as well. """
        return self._recursivePickRandom(self.top)
    def _recursivePickRandom(self, top):
        r = random.randrange(top.size)
        if r == 0: return top.object
        elif top.a and r <= top.a.size: return self._recursivePickRandom(top.a)
        return self._recursivePickRandom(top.b)

if __name__ == '__main__':
    s = RandomSet()
    for i in [5,3,7,1,4,6,9,2,8,0]:
        s.add(i)

    dists = [0]*10
    for i in xrange(10000):
        dists[s.pickRandom()] += 1
    print dists

I got [995, 975, 971, 995, 1057, 1004, 966, 1052, 984, 1001] as output, so the distribution seams good.

I've struggled with the same problem for myself, and I haven't yet decided weather the performance gain of this more efficient pick is worth the overhead of using a python based collection. I could of course refine it and translate it to C, but that is too much work for me today :)

Dunigan answered 27/12, 2009 at 0:13 Comment(2)
A reason I think this is not implemented in a binary tree is that such a method wouldn't pick items uniformly. Since their are nodes without left/right children, a situation may occur where the left child contains more items than the right child (or vice versa), this would make picking an item at the right (or left) child, more probable.Hexad
@CommuSoft: That's why I store the size of each subtree, so I can choose my probabilities based on those.Dunigan
C
0

In Mathematica:

a = {1, 2, 3, 4, 5}

a[[ ⌈ Length[a] Random[] ⌉ ]]

Or, in recent versions, simply:

RandomChoice[a]

Random[] generates a pseudorandom float between 0 and 1. This is multiplied by the length of the list and then the ceiling function is used to round up to the next integer. This index is then extracted from a.

Since hash table functionality is frequently done with rules in Mathematica, and rules are stored in lists, one might use:

a = {"Badger" -> 5, "Bird" -> 1, "Fox" -> 3, "Frog" -> 2, "Wolf" -> 4};
Corbitt answered 15/4, 2011 at 23:14 Comment(0)
O
0

you can also transfer the set to array use array it will probably work on small scale i see the for loop in the most voted answer is O(n) anyway

Object[] arr = set.toArray();

int v = (int) arr[rnd.nextInt(arr.length)];
Obscuration answered 11/11, 2014 at 19:43 Comment(0)
G
0

A generic solution using Khoth's answer as a starting point.

/**
 * @param set a Set in which to look for a random element
 * @param <T> generic type of the Set elements
 * @return a random element in the Set or null if the set is empty
 */
public <T> T randomElement(Set<T> set) {
    int size = set.size();
    int item = random.nextInt(size);
    int i = 0;
    for (T obj : set) {
        if (i == item) {
            return obj;
        }
        i++;
    }
    return null;
}
Glomerule answered 27/3, 2015 at 10:52 Comment(0)
R
0

If you don't mind a 3rd party library, the Utils library has a IterableUtils that has a randomFrom(Iterable iterable) method that will take a Set and return a random element from it

Set<Object> set = new HashSet<>();
set.add(...);
...
Object random = IterableUtils.randomFrom(set);

It is in the Maven Central Repository at:

<dependency>
  <groupId>com.github.rkumsher</groupId>
  <artifactId>utils</artifactId>
  <version>1.3</version>
</dependency>
Repartee answered 7/8, 2017 at 23:5 Comment(0)
L
0
List<Integer> list=new ArrayList<Integer>(set);
int r=(int)(Math.random()*set.size());
return list.get(r);
Lugsail answered 20/11, 2023 at 11:41 Comment(1)
The question asked about Set, i.e. not java.util.List.Climber
G
-1

PHP, using MT:

$items_array = array("alpha", "bravo", "charlie");
$last_pos = count($items_array) - 1;
$random_pos = mt_rand(0, $last_pos);
$random_item = $items_array[$random_pos];
Gruel answered 24/9, 2008 at 0:28 Comment(0)
A
-1

If set size is not large then by using Arrays this can be done.

int random;
HashSet someSet;
<Type>[] randData;
random = new Random(System.currentTimeMillis).nextInt(someSet.size());
randData = someSet.toArray();
<Type> sResult = randData[random];
Aglimmer answered 4/1, 2016 at 15:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.