Expand random range of integers using probability distribution
Asked Answered
T

1

6

I tried to solve the classic problem of generating a random integer between 1 and 7, given a function that generates a random integer between 1 and 5. My approach was to add the result of 2 calls to rand5(), effectively turning this into a "sum of dice rolls" problem. The probability of the sum dice rolls is fairly easy to calculate, so I used it here. An explanation of this is after the code

My question is: How can I calculate what the values of counter should be? The current values are incorrect, as verified by experiment. Do integer values exist that satisfy the probability? And is there a better way to solve this problem with this approach?

def rand5():
    return random.randint(1,5)

def rand7():
    counter = [1,2,3,4,5,4,3]
    while 0 not in counter:
        sum = rand5() + rand5() - 2
        if sum <= 6:
            counter[sum] -= 1
    return counter.index(0) + 1

For reference, the following code appears to create a random distribution.

test_counter = [0,0,0,0,0,0,0,0,0,0,0,0]
for i in range(500000):
    test_counter[rand5() + rand5() - 2] += 1

test_counter[0] *= 60
test_counter[1] *= 30
test_counter[2] *= 20
test_counter[3] *= 15
test_counter[4] *= 12
test_counter[5] *= 15
test_counter[6] *= 20
test_counter[7] *= 0
test_counter[8] *= 0

print(test_counter)

Explanation of probability: The probability of the dice rolls can be calculated by listing the possible combinations of dice. For this problem, the numbers generated by each die (the rand5 function) would be:

{(1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), ..., (5,5)}

The probability of each sum is the number of ways the sum appears in the list, divided by the total number of items in the list. The list has 5^2 = 25 total elements. For example, a sum of 4 can be achieved by the following combinations {(1,3), (2,2), (3,1)}, so the probability of a sum of 4 is 3/25.

The probability of each result is then:

  1. 1/25
  2. 2/25
  3. 3/25
  4. 4/25
  5. 5/25
  6. 4/25
  7. 3/25
  8. 2/25
  9. 1/25

I tried to use this distribution to generate a uniform distribution by having the more common ones have to be generated multiple times, and this is stored in counter.

Trotta answered 17/12, 2018 at 21:29 Comment(2)
Just to clarify, this is not a homework question. I just wanted to try solving this problem on my own before looking up the 'correct' solution.Trotta
Would be awesome if you could explain a bit the theory behind it, just to make the question a bit more clear for other readers.Locke
T
1

Not sure going via dice distribution is good idea. In general, if you have random bits source, but short sequences, better combine and chop bits to make up longer random bits sequence. Along the lines

import random

def rand5():
    return random.randint(1, 5)

def twobits():
    q = rand5() - 1 # [0...5) range
    while q == 4: # dropping high bit
        q = rand5() - 1
    return q # [0...3) range, two random bits

def onebit():
    return twobits() & 1

def rand7():
    q = onebit() << 2 | twobits() # here we have [0...8) range
    while q == 0:                 # and dropping 0
        q = onebit() << 2 | twobits()
    return q

counter = 8*[0]

for i in range(500000):
    counter[rand7()] += 1

print(counter)

produced uniform in [1...8) sampling

[0, 71592, 71352, 71071, 71543, 71600, 71388, 71454]

Take two bits from one sample, one bit from another sample, combine them, some rejection and voila!

Tomasatomasina answered 18/12, 2018 at 0:36 Comment(6)
Interesting, but your program may never end, because of the while loops that depend on a random condition. I mean there is no formal proof that your program will end, so this is not an algorithm. As said on Wikipedia, « Generally, a program is only an algorithm if it stops eventually » (en.m.wikipedia.org/w/…)Beefwitted
@AlexandreFenyo Huh?!? Are you saying en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm is not an algorithm? Whole Markov chain sampling? Box-Muller for gaussian? I think you're mistakenTomasatomasina
It's a question of vocabulary: I understand you disagree with Knuth's characterization of algorithms ("An algorithm must always terminate after a finite number of steps ... a very finite number, a reasonable number") and with Stone's characterization of algorithms ("we define an algorithm to be a set of rules that precisely defines a sequence of operations [...] such that the sequence terminates in a finite time"). There are many different characterizations of algorithms. This is why Wikipedia wrote "generally". Or do you mean your program/algorithm terminates in a finite time?Beefwitted
@AlexandreFenyo Or do you mean your program/algorithm terminates in a finite time? Probability to not terminate has Lebesgue measure zero.Tomasatomasina
There is no absolute upper limit in the number of iterations of your program (a set that has a null measure may not be empty, and in this specific case, it is not empty). I wonder if your answer is the best one that can be expressed, or if there would exist some other correct answers that would have an absolute upper limit for the number of iterations. I have no idea how to prove that, but I suspect there is no valid answer with an absolute upper limit for the number of iterations. Do you think such an algorithm could exist, or do you think your answer is the best one to the initial question?Beefwitted
@ SeverinPappadeux : The event the program does not terminate is the set of outcomes that correspond each to a run that does not terminate. Therefore, the sample space is not R or R^n, it is simply the sample space of a general probability space. So, the probability measure of this probability space can not be the Lebesgue measure (the Lebesgue measure is defined on R or R^n). Anyway, the probability measure of the event the program does not terminate is zero.Beefwitted

© 2022 - 2024 — McMap. All rights reserved.