How can I make a random choice according to probabilities stored in a list (weighted random distribution)?
Asked Answered
C

6

38

Given a list of probabilities like:

P = [0.10, 0.25, 0.60, 0.05]

(I can ensure that the sum of all the variables in P is always 1)

How can I write a function that randomly returns a valid index, according to the values in the list? In other words, for this specific input, I want it to return 0 10% of the time, 1 25% of the time, 2 60% of the time and 3 the remainind 5% of the time.

Cutlery answered 14/12, 2010 at 8:34 Comment(5)
Actually, starting from Python 3.6 there is random.choices (note the 's' at the end) which allows submitting relative weights.Starflower
@NickstandswithUkraine would you please add an answer about this?Iseult
See also #353170. I think this question is probably the better canonical.Iseult
There is also stackoverflow.com/questions/2140787 to consider, as the specific case of repeated sampling without replacement is somewhat trickier.Iseult
Maybe it's better to edit the random.choices information into the top answer, since the interface is substantially the same.Iseult
F
63

You can easily achieve this with numpy. It has a choice function which accepts the parameter of probabilities.

np.random.choice(
  ['pooh', 'rabbit', 'piglet', 'Christopher'], 
  5,
  p=[0.5, 0.1, 0.1, 0.3]
)
Finch answered 12/11, 2016 at 11:22 Comment(1)
Concise, although I think numpy is overkill here, especially if the script had no dependencies beyond the standard libraryAbstinence
N
15

Basically, make a cumulative probability distribution (CDF) array. Basically, the value of the CDF for a given index is equal to the sum of all values in P equal to or less than that index. Then you generate a random number between 0 and 1 and do a binary search (or linear search if you want). Here's some simple code for it.

from bisect import bisect
from random import random

P = [0.10,0.25,0.60,0.05]

cdf = [P[0]]
for i in xrange(1, len(P)):
    cdf.append(cdf[-1] + P[i])

random_ind = bisect(cdf,random())

of course you can generate a bunch of random indices with something like

rs = [bisect(cdf, random()) for i in xrange(20)]

yielding

[2, 2, 3, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2]

(results will, and should vary). Of course, binary search is rather unnecessary for so few of possible indices, but definitely recommended for distributions with more possible indices.

Nanine answered 14/12, 2010 at 18:21 Comment(0)
A
12

Hmm interesting, how about...

  1. Generate a number between 0 and 1.

  2. Walk the list substracting the probability of each item from your number.

  3. Pick the item that, after substraction, took your number down to 0 or below.

That's simple, O(n) and should work :)

Abstinence answered 14/12, 2010 at 8:39 Comment(1)
It will help if the probabilities are pre-sorted in a descending way - the iteration is likely to terminate faster.Starflower
A
6

This problem is equivalent to sampling from a categorical distribution. This distribution is commonly conflated with the multinomial distribution which models the result of multiple samples from a categorical distribution.

In numpy, it is easy to sample from the multinomial distribution using numpy.random.multinomial, but a specific categorical version of this does not exist. However, it can be accomplished by sampling from the multinomial distribution with a single trial and then returning the non-zero element in the output.

import numpy as np
pvals = [0.10,0.25,0.60,0.05]
ind = np.where(np.random.multinomial(1,pvals))[0][0]
Alpenhorn answered 16/5, 2014 at 20:28 Comment(1)
I think using argmax() instead of where()[0][0] is simpler and does the same.Sulcate
B
3
import random

probs = [0.1, 0.25, 0.6, 0.05]
r = random.random()
index = 0
while(r >= 0 and index < len(probs)):
  r -= probs[index]
  index += 1
print index - 1
Bim answered 14/12, 2010 at 8:39 Comment(3)
Haha and here I thought ~2 seconds before you posted that I was being originalAbstinence
This always takes O(n) time where n is the len(probs). Can we do better?Sprayberry
@Sprayberry Yes: we can sort the probs and do a binary search. This would reduce to O(log n).Starflower
S
2

Starting from Python 3.6 there is the choices method (note the 's' at the end) in random

Quoting from the documentation:

random.choices(population, weights=None, *, cum_weights=None, k=1) Return a k sized list of elements chosen from the population with replacement

So the solution would look like this:

>> choices(['option1', 'option2', 'option3', 'option4'], [0.10, 0.25, 0.60, 0.05])
Starflower answered 7/7, 2022 at 22:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.