Python Weighted Random [duplicate]

Asked 21/2, 2013 at 0:26 Answered 21/2, 2013 at 2:35

I need to return different values based on a weighted round-robin such that 1 in 20 gets A, 1 in 20 gets B, and the rest go to C.

So:

A => 5%
B => 5%
C => 90%

Here's a basic version that appears to work:

import random

x = random.randint(1, 100)

if x <= 5:
    return 'A'
elif x > 5 and x <= 10:
    return 'B'
else:
    return 'C'

Is this algorithm correct? If so, can it be improved?

Extenuate answered 21/2, 2013 at 0:26 Comment(7)

You could use random.randint(1,20) for your case. – Epidaurus 21/2, 2013 at 0:31

@Epidaurus - How so? (1,20) would only return only let me eval if only A or B fell into a 5% range, but not both - right? – Extenuate 21/2, 2013 at 0:39

Your random integer can take value 1 to 20, if random integer is 1, you return A (5% chance), if random integer is 2, you return B (5% chance), if random integer is anything else you return C (90% likelihood). Am I missing something? – Epidaurus 21/2, 2013 at 0:42

6 in 1, half a dozen in the other. Your logic works now that I think about it. – Extenuate 21/2, 2013 at 0:44

In case you wanted to research this in the more general case, the concept you are referring to is "generating random variates using the inverse cdf method." – Shilashilha 21/2, 2013 at 0:50

I'd like to point out an elaborate article that covers this very well: eli.thegreenplace.net/2010/01/22/… – Rectilinear 18/2, 2015 at 9:21

I actually prefer the solution you provided in your question, which saves memory space comparing with the below answers that leverage an extra list of characters/strings. Thanks! – Alvis 17/9, 2020 at 19:26

Your algorithm is correct, how about something more elegant:

import random
my_list = ['A'] * 5 + ['B'] * 5 + ['C'] * 90
random.choice(my_list)

Justitia answered 21/2, 2013 at 0:42 Comment(9)

Why not just my_list = ['A'] * 5 + ['B'] * 5 + ['C'] * 90? – Shilashilha 21/2, 2013 at 0:47

you are right, updated. – Justitia 21/2, 2013 at 0:48

As an aside, I'm going to be implementing this in Lua, not Python - but still - very cool! – Extenuate 21/2, 2013 at 0:49

+1, but as an aside, it's not necessary to generate 100 list items, as long as the number of items remains proportional. In this instance, you can do ['A', 'B'] + ['C'] * 18. – Shilashilha 21/2, 2013 at 0:53

@JoelCornett Thought about that, but I think it is more readable this way. Thanks for the correction. – Justitia 21/2, 2013 at 0:56

Doesn't need to be a list: eg. random.choice('A' * 5 + 'B' * 5 + 'C' * 90) – Degradable 12/2, 2014 at 0:17

In this limited case, it may not matter, but this approach is inefficient in general, both in terms of time and space. A better solution is to assign ranges in [0,1) corresponding to the proportion of the weight, e.g., [0, 5/100) for A, [5/100, 10/100) for B and [10/100, 1) for C, with the appropriate approximations/rounding when dealing with such things like repeating decimals, and then generating a random number with random.random or random.uniform. – Bathesda 30/1, 2017 at 16:56

As of Python 3.6, you can use random.choices. And if you care about speed, the next two answers are faster. – Filide 23/2, 2018 at 6:4

One of the most elegant answers I have seen on Python stack overflow. This made me happy. – Westsouthwest 1/8, 2019 at 13:41

that's fine. more generally, you can define something like:

from collections import Counter
from random import randint

def weighted_random(pairs):
    total = sum(pair[0] for pair in pairs)
    r = randint(1, total)
    for (weight, value) in pairs:
        r -= weight
        if r <= 0: return value

results = Counter(weighted_random([(1,'a'),(1,'b'),(18,'c')])
                  for _ in range(20000))
print(results)

which gives

Counter({'c': 17954, 'b': 1039, 'a': 1007})

which is as close to 18:1:1 as you can expect.

Bogoch answered 21/2, 2013 at 0:38 Comment(4)

You are assuming the input weights are in ascending order. Sort the input weights first if you want to be safe. – Epsom 12/9, 2014 at 18:11

It is not neccesary - such algorithm decrements the counter for each entry, so the result is the same (statistically) – Newsletter 19/9, 2014 at 18:45

Adding to what Luis Masuelli said, for any given probability the range of values which correspond is: the sum of all previous elements in the array to that number plus the probability value. e.g. [.05, .05, .5, .4] correspond approx to .00-.05 | .051 - .10 | .101 - .6 | .601 - 1 ranges – Doubletalk 4/6, 2017 at 11:26

This is better than the accepted answer in my opinion. I get that Python code should strive for readability, but just throwing away memory like it is is crazy. Especially if you're using this in simulations where the weights are constantly changing. Interpreter will be building and cleaning up sooo many lists – Disguise 29/4, 2018 at 12:47

If you want to use weighted random and not percentile random, you can make your own Randomizer class:

import random

class WeightedRandomizer:
    def __init__ (self, weights):
        self.__max = .0
        self.__weights = []
        for value, weight in weights.items ():
            self.__max += weight
            self.__weights.append ( (self.__max, value) )

    def random (self):
        r = random.random () * self.__max
        for ceil, value in self.__weights:
            if ceil > r: return value

w = {'A': 1.0, 'B': 1.0, 'C': 18.0}
#or w = {'A': 5, 'B': 5, 'C': 90}
#or w = {'A': 1.0/18, 'B': 1.0/18, 'C': 1.0}
#or or or

wr = WeightedRandomizer (w)

results = {'A': 0, 'B': 0, 'C': 0}
for i in range (10000):
    results [wr.random () ] += 1

print ('After 10000 rounds the distribution is:')
print (results)

Adamina answered 21/2, 2013 at 2:35 Comment(0)

It seems correct since you are using a uniform random variable with independent draws the probability for each number will be 1/n (n=100).

You can easily verify your algorithm by running it say 1000 time and see the frequency for each letter.

Another algorithm you might consider is to generate an array with your letters given the frequency you want for each letter and only generate a single random number which is the index in the array

It will be less efficient in memory but should perform better

Edit:

To respond to @Joel Cornett comment, an example will be very similar to @jurgenreza but more memory efficient

import random
data_list = ['A'] + ['B'] + ['C'] * 18
random.choice(data_list )

Calpe answered 21/2, 2013 at 0:38 Comment(0)

Recommended topics

Hot tags