How to get a specific number of random elements from a collection in Smalltalk?
Asked Answered
C

2

6

How can I elegantly get a specific number (>1) of distinct, random elements from a collection?

Chicky answered 26/2, 2016 at 10:49 Comment(0)
G
3

This is something that I think looks more or less nice, but is not as efficient as it can be:

yourCollection asSet asOrderedCollection shuffled first: numberOfElements
Gradin answered 26/2, 2016 at 10:57 Comment(1)
Very elegant. The only way I could see to improve efficiency (which typically wouldn't be an issue, but some people get hung up on it...) would be to shuffle the indices instead of the collection (because there'd be fewer indices), and then use something like #atAll: to pick those randomised indices out of a set (you'd still need #asSet if you want them distinct).Dunstan
H
5

Consider the following code snippet

sample: anInteger from: aCollection using: aGenerator
  | sample |
  sample := Set new: anInteger.
  [sample size = anInteger]
    whileFalse: [ | element |
      element := aCollection atRandom: aGenerator.
      sample add: element].
  ^sample asArray

Some remarks

  • Explicit Generator: It explicitly uses a given generator, i.e., an instance of Random, which I've called aGenerator. For mathematical reasons, if you are getting samples for your application, all of them should use the very same generator across your program. Also this will give you an additional advantage: save and later restore the seed and you will be able to reproduce a previous "random" behavior of your system, which is good for testing.

  • No check for availability: The code doesn't check that it is possible to get the desired sample, which would be the case if aCollection doesn't have at least anInteger different elements.

  • Classless code: The method should go to some class.

For example:

 Random >> sample: anInteger from: aCollection
   | sample |
   sample := Set new: anInteger.
   [sample size = anInteger]
     whileFalse: [ | element |
       element := aCollection atRandom: self.
       sample add: element].
   ^sample asArray

UPDATE

Here is another approach:

Random >> remove: anInteger from: aCollection
  | sample |
  sample := OrderedCollection new: anInteger.
  anInteger timesRepeat: [| index element |
    index := aCollection size atRandom: self.
    element := aCollection removeAt: index.
    sample add: element].
  ^sample

Comment

It usually happens that when we want to sample without repetition we also want to remove the elements from the collection as we randomly pick them. In these cases what often happens is that the collection is known to have no repetitions.

Hydrogenate answered 26/2, 2016 at 14:39 Comment(4)
Hmm, I like the "byo generator" approach, but if the collection and the sample size are large, you may be stuck towards the end waiting quite a while until the generator finally picks a "free" number.Dunstan
@AmosM.Carpenter good point. What I use is closer to the second approach.Hydrogenate
Sorry to be nitpicky, but you shouldn't use #removeIndex: - it's meant to be private. Using the public method #removeAt: instead would have the added benefit of answering the removed element, meaning you can get rid of the extra element temporary variable (i.e. just do sample add: (aCollection removeAt: index)). Both those "remove" methods are on OrderedCollection, so this won't work with other types of collections.Dunstan
@AmosM.Carpenter Thanks for the review. I'm not a Pharo programmer so I hadn't realized that removeIndex: was private. I've modified the method according to your remark. And yes, the method is only meant to work with OrderCollection (which should be fine with most practical applications.)Hydrogenate
G
3

This is something that I think looks more or less nice, but is not as efficient as it can be:

yourCollection asSet asOrderedCollection shuffled first: numberOfElements
Gradin answered 26/2, 2016 at 10:57 Comment(1)
Very elegant. The only way I could see to improve efficiency (which typically wouldn't be an issue, but some people get hung up on it...) would be to shuffle the indices instead of the collection (because there'd be fewer indices), and then use something like #atAll: to pick those randomised indices out of a set (you'd still need #asSet if you want them distinct).Dunstan

© 2022 - 2024 — McMap. All rights reserved.