How to get random objects from a stream
Asked Answered
U

12

27

Lets say I have a list of words and i want to create a method which takes the size of the new list as a parameter and returns the new list. How can i get random words from my original sourceList?

public List<String> createList(int listSize) {
   Random rand = new Random();
   List<String> wordList = sourceWords.
      stream().
      limit(listSize).
      collect(Collectors.toList()); 

   return wordList;
}

So how and where can I use my Random?

Unblushing answered 29/7, 2015 at 14:20 Comment(5)
Is there any reason to use a Stream ? Can't you shuffle the original list, and then return a copy with subList ?Synchro
Do you know how many words are in the source stream?Til
What @AlexisC. said. Collections.shuffle and list.subList(0,size) should be enough.Labialized
See also this answer, if there is a large number of elements which might make shuffling expensive: https://mcmap.net/q/505930/-perform-operation-on-n-random-distinct-elements-from-collection-using-streams-apiSheya
This question seems to be a duplicate of Perform operation on n random distinct elements from Collection using Streams API.Banda
U
28

I've found a proper solution. Random provides a few methods to return a stream. For example ints(size) which creates a stream of random integers.

public List<String> createList(int listSize)
{
   Random rand = new Random();
   List<String> wordList = rand.
      ints(listSize, 0, sourceWords.size()).
      mapToObj(i -> sourceWords.get(i)).
      collect(Collectors.toList());

   return wordList;
}
Unblushing answered 29/7, 2015 at 15:55 Comment(4)
Using this solution you may get repeating words in the result which might be undesirable. If it's ok in your case, please edit the question to explicitly state this.Coif
Good point. But i could resolve that by adding .distinct right?Unblushing
This way your resulting list will be shorter than requested. Probably rand.ints(0, sourceWords.size()).distinct().limit(listSize).mapToObj(sourceWords::get) would be better.Coif
Unfortunately that doesn't work. Limit() is just ignored in this case.Unblushing
E
16

I think the most elegant way is to have a special collector.

I am pretty sure the only way you can guarantee that each item has an equal chance of being picked, is to collect, shuffle and re-stream. This can be easily done using built-in Collectors.collectingAndThen(...) helper.

Sorting by a random comparator or using randomized reducer, like suggested on some other answers, will result in very biased randomness.

List<String> wordList = sourceWords.stream()
  .collect(Collectors.collectingAndThen(Collectors.toList(), collected -> {
      Collections.shuffle(collected);
      return collected.stream();
  }))
  .limit(listSize)
  .collect(Collectors.toList());

You can move that shuffling collector to a helper function:

public class CollectorUtils {

    public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
        return Collectors.collectingAndThen(Collectors.toList(), collected -> {
            Collections.shuffle(collected);
            return collected.stream();
        });
    }

}

I assume that you are looking for a way to nicely integrate with other stream processing functions. So following straightforward solution is not what you are looking for :)

Collections.shuffle(wordList)
return wordList.subList(0, limitSize)
Evelinevelina answered 19/2, 2016 at 18:49 Comment(0)
O
4

This is my one line solution:

 List<String> st = Arrays.asList("aaaa","bbbb","cccc");
 st.stream().sorted((o1, o2) -> RandomUtils.nextInt(0, 2)-1).findFirst().get();

RandomUtils are from commons lang 3

Occultism answered 16/1, 2018 at 7:37 Comment(2)
Improved version: st.stream().min((o1, o2) -> o1 == o2 ? 0 : (ThreadLocalRandom.current().nextBoolean() ? -1 : 1)).orElseThrow(); 1. Used java built-in class ThreadLocalRandom 2. nextInt generates one from [-1, 0, 1] but 0 means equals for the elements and first element will be taken more cases than the element at farther positions.Robbierobbin
Feels like a lot of computation is required just to get a random item, since JVM has to fist sort the whole list according to the the randomised method, followed by picking up the first item.Skinny
A
3

Here's a solution I came up with which seems to differ from all the other ones, so I figured why not add it to the pile.

Basically it works by using the same kind of trick as one iteration of Collections.shuffle each time you ask for the next element - pick a random element, swap that element with the first one in the list, move the pointer forwards. Could also do it with the pointer counting back from the end.

The caveat is that it does mutate the list you passed in, but I guess you could just take a copy as the first thing if you didn't like that. We were more interested in reducing redundant copies.

private static <T> Stream<T> randomStream(List<T> list)
{
    int characteristics = Spliterator.SIZED;
    // If you know your list is also unique / immutable / non-null
    //int characteristics = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.SIZED;
    Spliterator<T> spliterator = new Spliterators.AbstractSpliterator<T>(list.size(), characteristics)
    {
        private final Random random = new SecureRandom();
        private final int size = list.size();
        private int frontPointer = 0;

        @Override
        public boolean tryAdvance(Consumer<? super T> action)
        {
            if (frontPointer == size)
            {
                return false;
            }

            // Same logic as one iteration of Collections.shuffle, so people talking about it not being
            // fair randomness can take that up with the JDK project.
            int nextIndex = random.nextInt(size - frontPointer) + frontPointer;
            T nextItem = list.get(nextIndex);
            // Technically the value we end up putting into frontPointer
            // is never used again, but using swap anyway, for clarity.
            Collections.swap(list, nextIndex, frontPointer);

            frontPointer++;
            // All items from frontPointer onwards have not yet been chosen.

            action.accept(nextItem);
            return true;
        }
    };

    return StreamSupport.stream(spliterator, false);
}
Anemoscope answered 24/2, 2017 at 2:30 Comment(2)
@Holger fair enough, it was pulled out of a private method we had where there was actually a guarantee of both those things. :) First time hearing about Collections.swap too.Anemoscope
@Holger I just figured out also why the bug doesn't matter.Anemoscope
P
0

Try something like that:

List<String> getSomeRandom(int size, List<String> sourceList) {
    List<String> copy = new ArrayList<String>(sourceList);
    Collections.shuffle(copy);
    List<String> result = new ArrayList<String>();
    for (int i = 0; i < size; i++) {
        result.add(copy.get(i));
    }

    return result;
}
Pneumatic answered 29/7, 2015 at 14:36 Comment(3)
It doesn't really use java 8 stream APIKlansman
@Toilal: still, it’s a solution. Not everything gets better with streams. However, if a rather small result list is required, I’d go into this direction instead of shuffling the entire list…Slighting
I agree. When input list is considered big, shuffling it all is not efficient.Pneumatic
P
0

The answer is very simple(with stream):

List<String> a = src.stream().sorted((o1, o2) -> {
        if (o1.equals(o2)) return 0;
        return (r.nextBoolean()) ? 1 : -1;
    }).limit(10).collect(Collectors.toList());

You can test it:

List<String> src = new ArrayList<String>();
for (int i = 0; i < 20; i++) {
    src.add(String.valueOf(i*10));
}
Random r = new Random();
List<String> a = src.stream().sorted((o1, o2) -> {
        if (o1.equals(o2)) return 0;
        return (r.nextBoolean()) ? 1 : -1;
    }).limit(10).collect(Collectors.toList());
System.out.println(a);
Paramatta answered 29/7, 2015 at 14:38 Comment(13)
r.nextBoolean() would be easierKlansman
Yes, I just didn't think about it :) But the point still the sameParamatta
This uses a comparator which doesn't impose any definite ordering at all, thus breaking its general contract in the extremest sense. Even if it happens to work with some sorting implementation, it does so by pure accident.Yearlong
This is what @Swiss_Codeaholic wants — random order. I thought this is kind of a standart that callback should return positive number if one argument 'more than' another, and negative if the opposite. Am I wrong?Paramatta
The comparator should impose some definite random order. Your comparator doesn't impose any order at all. Actually, it doesn't even define a general relation because it returns a different value each time, even when repeatedly asked to compare the same two objects. An order must additionally be reflexive, symmetric, and transitive.Yearlong
But is it really possible to follow these rules with random order? If the comparator would return the same number each time when asked to compare the same objects, wouldn't it be random?Paramatta
Of course it is possible. A random order is just another order. A comparator which returns a random value doesn't specify any order at all. Take a list processed by Collections.shuffle(): for any two elements in that list, one comes before the other every time you check. And surely, an element compared to itself yields equality. Your comparator never returns zero.Yearlong
I have edited the answer, so now it can return 0, thanks. But I just checked Collections.shuffle() for List<String> src from my example, and for any two elements every time they are in different order. Maybe I misunderstand you?Paramatta
Yes, you do. You called shuffle() again each time. I mentioned a list which was processed by shuffle once and then inspected. That list has a random, but definite order, and the equivalent of that is what OP needs. Your solution is like shuffling the whole list each time you call iterator.next()Yearlong
Now I got it. Thank you!Paramatta
Your solution is actually broken. It may work for small lists (less than 32 elements), because in this case JDK switches to simpler sorting algorithm. But for big arrays the sort will randomly crash because of contract violation. Try this source: List<String> src = IntStream.range(0, 100000).mapToObj(String::valueOf).collect(Collectors.toList());. Use, for example, 0 seed (Random r = new Random(0);) and it will crash.Coif
didn't know that, thanks! Where can I find more info on stream api (beside oracle documentatiob)?Paramatta
@Timofey: This has nothing to do with the stream API. It’s the Comparator contract. You can expect every method using a comparator to dislike broken contracts. I.e. Collections.sort behaves the same, even under Java 7.Slighting
E
0

If you want non repeated items in the result list and your initial list is immutable:

  • There isn't a direct way to get it from the current Streams API.
  • It's not possible to use a random Comparator because it's going to break the compare contract.

You can try something like:

public List<String> getStringList(final List<String> strings, final int size) {
    if (size < 1 || size > strings.size()) {
        throw new IllegalArgumentException("Out of range size.");
    }

    final List<String> stringList = new ArrayList<>(size);

    for (int i = 0; i < size; i++) {
        getRandomString(strings, stringList)
                .ifPresent(stringList::add);
    }

    return stringList;
}

private Optional<String> getRandomString(final List<String> stringList, final List<String> excludeStringList) {
    final List<String> filteredStringList = stringList.stream()
            .filter(c -> !excludeStringList.contains(c))
            .collect(toList());

    if (filteredStringList.isEmpty()) {
        return Optional.empty();
    }

    final int randomIndex = new Random().nextInt(filteredStringList.size());
    return Optional.of(filteredStringList.get(randomIndex));
}
Exceed answered 13/7, 2020 at 20:55 Comment(0)
R
0

@kozla13 improved version:

List<String> st = Arrays.asList("aaaa","bbbb","cccc");
st.stream().min((o1, o2) -> o1 == o2 ? 0 : (ThreadLocalRandom.current().nextBoolean() ? -1 : 1)).orElseThrow();
  1. Used java built-in class ThreadLocalRandom
  2. nextInt generates one from sequence [-1, 0, 1], but return 0 in compare func means equals for the elements and side effect of this - first element (o1) will be always taken in this case.
  3. properly handle object equals case
Robbierobbin answered 14/9, 2021 at 12:38 Comment(0)
L
0

If the source list is generally much larger than the new list, you might gain some efficiencies by using a BitSet to get random indices:

List<String> createList3(int listSize, List<String> sourceList) {
  if (listSize > sourceList.size()) {
    throw new IllegalArgumentException("Not enough words in the source list.");
  }

  List<String> newWords = randomWords(listSize, sourceList);
  Collections.shuffle(newWords); // optional, for random order
  return newWords;
}

private List<String> randomWords(int listSize, List<String> sourceList) {
  int endExclusive = sourceList.size();
  BitSet indices = new BitSet(endExclusive);
  Random rand = new Random();
  while (indices.cardinality() < listSize) {
    indices.set(rand.nextInt(endExclusive));
  }
  
  return indices.stream().mapToObj(i -> sourceList.get(i))
    .collect(Collectors.toList());
}
Limassol answered 28/9, 2022 at 17:45 Comment(0)
L
0

A stream is probably overkill. Copy the source list so you're not creating side-effects, then give back a sublist of the shuffled copy.

public static List<String> createList(int listSize, List<String> sourceList) {
  if (listSize > sourceList.size()) {
    throw IllegalArgumentException("Not enough words for new list.");
  }
  List<String> copy = new ArrayList<>(sourceList);
  Collections.shuffle(copy);
  return copy.subList(0, listSize);
}
Limassol answered 28/9, 2022 at 19:30 Comment(0)
D
0

One liner to randomize a stream:

Stream.of(1, 2, 3, 4, 5).sorted(Comparator.comparingDouble(x -> Math.random()))
Dubuffet answered 17/2, 2023 at 17:35 Comment(0)
C
0

I realise this is a very old question, but it came up as one of the first results I found when I was trying to do a similar thing.

I wasn't keen on any of the solutions I found, and in the end I went with this kind of approach:

    public List<String> createList(int listSize) {
    
        return sourceWords.stream()
            .map(word -> new SimpleEntry<>(word, rand.nextDouble()))
            .sorted(comparingDouble(SimpleEntry::getValue))
            .limit(listSize)
            .map(SimpleEntry::getKey)
            .toList();
    
    }

This first maps each entry of the stream to pair it with a random number, it then sorts by that random number, then limits the list, then retrieves the original word from the pair.

I use AbstractMap.SimpleEntry as a convenient class for holding a pair, available in the standard Java library, but you may have a nicer Pair or Tuple2 class available, depending which libraries you use.

Cimabue answered 2/6 at 16:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.