Python Weighted Random [duplicate]
Asked Answered
E

4

33

I need to return different values based on a weighted round-robin such that 1 in 20 gets A, 1 in 20 gets B, and the rest go to C.

So:

A => 5%
B => 5%
C => 90%

Here's a basic version that appears to work:

import random

x = random.randint(1, 100)

if x <= 5:
    return 'A'
elif x > 5 and x <= 10:
    return 'B'
else:
    return 'C'

Is this algorithm correct? If so, can it be improved?

Extenuate answered 21/2, 2013 at 0:26 Comment(7)
You could use random.randint(1,20) for your case.Epidaurus
@Epidaurus - How so? (1,20) would only return only let me eval if only A or B fell into a 5% range, but not both - right?Extenuate
Your random integer can take value 1 to 20, if random integer is 1, you return A (5% chance), if random integer is 2, you return B (5% chance), if random integer is anything else you return C (90% likelihood). Am I missing something?Epidaurus
6 in 1, half a dozen in the other. Your logic works now that I think about it.Extenuate
In case you wanted to research this in the more general case, the concept you are referring to is "generating random variates using the inverse cdf method."Shilashilha
I'd like to point out an elaborate article that covers this very well: eli.thegreenplace.net/2010/01/22/…Rectilinear
I actually prefer the solution you provided in your question, which saves memory space comparing with the below answers that leverage an extra list of characters/strings. Thanks!Alvis
J
64

Your algorithm is correct, how about something more elegant:

import random
my_list = ['A'] * 5 + ['B'] * 5 + ['C'] * 90
random.choice(my_list)
Justitia answered 21/2, 2013 at 0:42 Comment(9)
Why not just my_list = ['A'] * 5 + ['B'] * 5 + ['C'] * 90?Shilashilha
you are right, updated.Justitia
As an aside, I'm going to be implementing this in Lua, not Python - but still - very cool!Extenuate
+1, but as an aside, it's not necessary to generate 100 list items, as long as the number of items remains proportional. In this instance, you can do ['A', 'B'] + ['C'] * 18.Shilashilha
@JoelCornett Thought about that, but I think it is more readable this way. Thanks for the correction.Justitia
Doesn't need to be a list: eg. random.choice('A' * 5 + 'B' * 5 + 'C' * 90)Degradable
In this limited case, it may not matter, but this approach is inefficient in general, both in terms of time and space. A better solution is to assign ranges in [0,1) corresponding to the proportion of the weight, e.g., [0, 5/100) for A, [5/100, 10/100) for B and [10/100, 1) for C, with the appropriate approximations/rounding when dealing with such things like repeating decimals, and then generating a random number with random.random or random.uniform.Bathesda
As of Python 3.6, you can use random.choices. And if you care about speed, the next two answers are faster.Filide
One of the most elegant answers I have seen on Python stack overflow. This made me happy.Westsouthwest
B
39

that's fine. more generally, you can define something like:

from collections import Counter
from random import randint

def weighted_random(pairs):
    total = sum(pair[0] for pair in pairs)
    r = randint(1, total)
    for (weight, value) in pairs:
        r -= weight
        if r <= 0: return value

results = Counter(weighted_random([(1,'a'),(1,'b'),(18,'c')])
                  for _ in range(20000))
print(results)

which gives

Counter({'c': 17954, 'b': 1039, 'a': 1007})

which is as close to 18:1:1 as you can expect.

Bogoch answered 21/2, 2013 at 0:38 Comment(4)
You are assuming the input weights are in ascending order. Sort the input weights first if you want to be safe.Epsom
It is not neccesary - such algorithm decrements the counter for each entry, so the result is the same (statistically)Newsletter
Adding to what Luis Masuelli said, for any given probability the range of values which correspond is: the sum of all previous elements in the array to that number plus the probability value. e.g. [.05, .05, .5, .4] correspond approx to .00-.05 | .051 - .10 | .101 - .6 | .601 - 1 rangesDoubletalk
This is better than the accepted answer in my opinion. I get that Python code should strive for readability, but just throwing away memory like it is is crazy. Especially if you're using this in simulations where the weights are constantly changing. Interpreter will be building and cleaning up sooo many listsDisguise
A
9

If you want to use weighted random and not percentile random, you can make your own Randomizer class:

import random

class WeightedRandomizer:
    def __init__ (self, weights):
        self.__max = .0
        self.__weights = []
        for value, weight in weights.items ():
            self.__max += weight
            self.__weights.append ( (self.__max, value) )

    def random (self):
        r = random.random () * self.__max
        for ceil, value in self.__weights:
            if ceil > r: return value

w = {'A': 1.0, 'B': 1.0, 'C': 18.0}
#or w = {'A': 5, 'B': 5, 'C': 90}
#or w = {'A': 1.0/18, 'B': 1.0/18, 'C': 1.0}
#or or or

wr = WeightedRandomizer (w)

results = {'A': 0, 'B': 0, 'C': 0}
for i in range (10000):
    results [wr.random () ] += 1

print ('After 10000 rounds the distribution is:')
print (results)
Adamina answered 21/2, 2013 at 2:35 Comment(0)
C
0

It seems correct since you are using a uniform random variable with independent draws the probability for each number will be 1/n (n=100).

You can easily verify your algorithm by running it say 1000 time and see the frequency for each letter.

Another algorithm you might consider is to generate an array with your letters given the frequency you want for each letter and only generate a single random number which is the index in the array

It will be less efficient in memory but should perform better

Edit:

To respond to @Joel Cornett comment, an example will be very similar to @jurgenreza but more memory efficient

import random
data_list = ['A'] + ['B'] + ['C'] * 18
random.choice(data_list )
Calpe answered 21/2, 2013 at 0:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.