Scipy poisson distribution with an upper limit
Asked Answered
E

2

7

I am generating a random number using scipy stats. I used the Poisson distribution. Below is an example:

import scipy.stats as sct

A =2.5
Pos = sct.poisson.rvs(A,size = 20)

When I print Pos, I got the following numbers:

array([1, 3, 2, 3, 1, 2, 1, 2, 2, 3, 6, 0, 0, 4, 0, 1, 1, 3, 1, 5])

You can see from the array that some of the number,such as 6, is generated.

What I want to do it to limit the biggest number(let's say 5), i.e. any random number generated using sct.poisson.rvs should be equal or less than 5,

How can I tweak my code to achieve it. By the way, I am using this in Pandas Dataframe.

Episcopalian answered 19/9, 2018 at 5:45 Comment(1)
You can't control the random number distribution, unless you manually alter the numbers after you get the array, which is trivial. Otherwise you may want to look into other distributions which are limited, such as beta.Monandrous
M
3

What you want could be called the truncated Poisson distribution, except that in the common usage of this term, truncation happens from below instead of from above (example). The easiest, even if not always the most efficient, way to sample a truncated distribution is to double the requested array size and keep only the elements that fall in the desired range; if there are not enough, double the size again, etc. As shown below:

import scipy.stats as sct

def truncated_Poisson(mu, max_value, size):
    temp_size = size
    while True:
        temp_size *= 2
        temp = sct.poisson.rvs(mu, size=temp_size)
        truncated = temp[temp <= max_value]
        if len(truncated) >= size:
            return truncated[:size]

mu = 2.5
max_value = 5
print(truncated_Poisson(mu, max_value, 20))

Typical output: [0 1 4 5 0 2 3 2 2 2 5 2 3 3 3 3 4 1 0 3].

Marivaux answered 19/9, 2018 at 18:45 Comment(2)
Dear, Thanks for the advice and sorry for the late reply. I think this function works and suits my application better.because I am using it in dataframe.Episcopalian
Hi @Welcome to Stack, I was using this function in Pandas data frame and it showed me the following error: ValueError: size does not match the broadcast shape of the parameters. the data frame contain 10 rows and 13 column.I am trying to create a new column which used truncated_Poisson function. How would I do this? Below is the code for new column UCL_Fix_Dub ['Team1_goals'] = truncated_Poisson(UCL_Fix_Dub.Team1_XG,max_goal,1)Episcopalian
S
8

I think the solution is quite simple (assuming I understood your issue correctly):

# for repeatability:
import numpy as np
np.random.seed(0)

from scipy.stats import poisson, uniform
sample_size = 20
maxval = 5
mu = 2.5

cutoff = poisson.cdf(maxval, mu)
# generate uniform distribution [0, cutoff):
u = uniform.rvs(scale=cutoff, size=sample_size)
# convert to Poisson:
truncated_poisson = poisson.ppf(u, mu)

Then print(truncated_poisson):

[2. 3. 3. 2. 2. 3. 2. 4. 5. 2. 4. 2. 3. 4. 0. 1. 0. 4. 3. 4.]
Sebastian answered 19/9, 2018 at 19:12 Comment(4)
Dear AGN, Thanks for the advice and sorry for my late replyEpiscopalian
Was wondering why this method gives a similar sequence of random numbers in multiple runs (even without np.random.seed(0)?Pathos
I commented out the np.random.seed(0) line and re-run the entire code in my answer and I got a different sequence. I cannot reproduce your issue. Maybe you could provide a more detailed description of exactly how you are running the code?Sebastian
This one should be the accepted answer. It is more efficient, and also uses quantiles and a base measure which will help the programmer understand more math for the future.Bradski
M
3

What you want could be called the truncated Poisson distribution, except that in the common usage of this term, truncation happens from below instead of from above (example). The easiest, even if not always the most efficient, way to sample a truncated distribution is to double the requested array size and keep only the elements that fall in the desired range; if there are not enough, double the size again, etc. As shown below:

import scipy.stats as sct

def truncated_Poisson(mu, max_value, size):
    temp_size = size
    while True:
        temp_size *= 2
        temp = sct.poisson.rvs(mu, size=temp_size)
        truncated = temp[temp <= max_value]
        if len(truncated) >= size:
            return truncated[:size]

mu = 2.5
max_value = 5
print(truncated_Poisson(mu, max_value, 20))

Typical output: [0 1 4 5 0 2 3 2 2 2 5 2 3 3 3 3 4 1 0 3].

Marivaux answered 19/9, 2018 at 18:45 Comment(2)
Dear, Thanks for the advice and sorry for the late reply. I think this function works and suits my application better.because I am using it in dataframe.Episcopalian
Hi @Welcome to Stack, I was using this function in Pandas data frame and it showed me the following error: ValueError: size does not match the broadcast shape of the parameters. the data frame contain 10 rows and 13 column.I am trying to create a new column which used truncated_Poisson function. How would I do this? Below is the code for new column UCL_Fix_Dub ['Team1_goals'] = truncated_Poisson(UCL_Fix_Dub.Team1_XG,max_goal,1)Episcopalian

© 2022 - 2024 — McMap. All rights reserved.