How can I make a random choice according to probabilities stored in a list (weighted random distribution)?

Asked 14/12, 2010 at 8:34 Answered 7/7, 2022 at 22:22

Given a list of probabilities like:

P = [0.10, 0.25, 0.60, 0.05]

(I can ensure that the sum of all the variables in P is always 1)

How can I write a function that randomly returns a valid index, according to the values in the list? In other words, for this specific input, I want it to return 0 10% of the time, 1 25% of the time, 2 60% of the time and 3 the remainind 5% of the time.

Cutlery answered 14/12, 2010 at 8:34 Comment(5)

Actually, starting from Python 3.6 there is random.choices (note the 's' at the end) which allows submitting relative weights. – Starflower 3/8, 2020 at 3:44

@NickstandswithUkraine would you please add an answer about this? – Iseult 5/7, 2022 at 8:4

See also #353170. I think this question is probably the better canonical. – Iseult 5/7, 2022 at 8:6

There is also stackoverflow.com/questions/2140787 to consider, as the specific case of repeated sampling without replacement is somewhat trickier. – Iseult 5/7, 2022 at 21:2

Maybe it's better to edit the random.choices information into the top answer, since the interface is substantially the same. – Iseult 5/7, 2022 at 21:8

You can easily achieve this with numpy. It has a choice function which accepts the parameter of probabilities.

np.random.choice(
  ['pooh', 'rabbit', 'piglet', 'Christopher'], 
  5,
  p=[0.5, 0.1, 0.1, 0.3]
)

Finch answered 12/11, 2016 at 11:22 Comment(1)

Concise, although I think numpy is overkill here, especially if the script had no dependencies beyond the standard library – Abstinence 16/5, 2020 at 23:29

Basically, make a cumulative probability distribution (CDF) array. Basically, the value of the CDF for a given index is equal to the sum of all values in P equal to or less than that index. Then you generate a random number between 0 and 1 and do a binary search (or linear search if you want). Here's some simple code for it.

from bisect import bisect
from random import random

P = [0.10,0.25,0.60,0.05]

cdf = [P[0]]
for i in xrange(1, len(P)):
    cdf.append(cdf[-1] + P[i])

random_ind = bisect(cdf,random())

of course you can generate a bunch of random indices with something like

rs = [bisect(cdf, random()) for i in xrange(20)]

yielding

[2, 2, 3, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2]

(results will, and should vary). Of course, binary search is rather unnecessary for so few of possible indices, but definitely recommended for distributions with more possible indices.

Nanine answered 14/12, 2010 at 18:21 Comment(0)

Hmm interesting, how about...

Generate a number between 0 and 1.
Walk the list substracting the probability of each item from your number.
Pick the item that, after substraction, took your number down to 0 or below.

That's simple, O(n) and should work :)

Abstinence answered 14/12, 2010 at 8:39 Comment(1)

It will help if the probabilities are pre-sorted in a descending way - the iteration is likely to terminate faster. – Starflower 4/8, 2020 at 0:2

This problem is equivalent to sampling from a categorical distribution. This distribution is commonly conflated with the multinomial distribution which models the result of multiple samples from a categorical distribution.

In numpy, it is easy to sample from the multinomial distribution using numpy.random.multinomial, but a specific categorical version of this does not exist. However, it can be accomplished by sampling from the multinomial distribution with a single trial and then returning the non-zero element in the output.

import numpy as np
pvals = [0.10,0.25,0.60,0.05]
ind = np.where(np.random.multinomial(1,pvals))[0][0]

Alpenhorn answered 16/5, 2014 at 20:28 Comment(1)

I think using argmax() instead of where()[0][0] is simpler and does the same. – Sulcate 16/10, 2018 at 19:9

import random

probs = [0.1, 0.25, 0.6, 0.05]
r = random.random()
index = 0
while(r >= 0 and index < len(probs)):
  r -= probs[index]
  index += 1
print index - 1

Bim answered 14/12, 2010 at 8:39 Comment(3)

Haha and here I thought ~2 seconds before you posted that I was being original – Abstinence 14/12, 2010 at 8:41

This always takes O(n) time where n is the len(probs). Can we do better? – Sprayberry 28/8, 2019 at 20:31

@Sprayberry Yes: we can sort the probs and do a binary search. This would reduce to O(log n). – Starflower 4/8, 2020 at 0:4

Starting from Python 3.6 there is the choices method (note the 's' at the end) in random

Quoting from the documentation:

random.choices(population, weights=None, *, cum_weights=None, k=1) Return a k sized list of elements chosen from the population with replacement

So the solution would look like this:

>> choices(['option1', 'option2', 'option3', 'option4'], [0.10, 0.25, 0.60, 0.05])

Starflower answered 7/7, 2022 at 22:22 Comment(0)

Recommended topics

Hot tags