How to avoid roundoff errors in numpy.random.choice?

Asked 25/2, 2022 at 7:33 Answered 8/3, 2022 at 19:13

Solved python numpy random floating-point floating-accuracy

Say x_1, x_2, ..., x_n are n objects and one wants to pick one of them so that the probability of choosing x_i is proportional to some number u_i. Numpy provides a function for that:

x, u = np.array([x_1, x_2, ..., x_n]), np.array([u_1, ..., u_n])
np.random.choice(x, p = u/np.sum(u))

However, I have observed that this code sometimes throws a ValueError saying "probabilities do not sum to 1.". This is probably due to the round-off errors of finite precision arithmetic. What should one do to make this function work properly?

Enravish answered 25/2, 2022 at 7:33 Comment(6)

What type of error are you worried about? – Cistaceous 25/2, 2022 at 7:46

similar question – Spry 25/2, 2022 at 8:57

@Cistaceous exactly this: "ValueError: probabilities do not sum to 1" – Isaacs 3/3, 2022 at 20:18

And does the solution to the question pointed out by @Spry help? – Cistaceous 8/3, 2022 at 15:57

@Cistaceous https://mcmap.net/q/481094/-np-random-choice-probabilities-do-not-sum-to-1 provides a solution. numpy.random.multinomial (docs.scipy.org/doc/numpy-1.15.0/reference/generated/…) automatically adjusts the last probability to solve the issue but it is noted that this should not be relied upon. Other answers, do not give a satisfactory answer. For example, the accepted solution to that question https://mcmap.net/q/481094/-np-random-choice-probabilities-do-not-sum-to-1 suggests to normalize the probabilities, which may fail to solve the problem due to roundoff errors. See the comment by pd shah to that answer. – Isaacs 8/3, 2022 at 18:56

It all really begs the question why numpy doesn't just do this stuff internally. I mean a key point of numpy is to make it easy to do complex numerical calculations without having to be an expert in IEEE-754 roundoff bs. – Geniculate 14/9, 2022 at 22:25

After reading the answer https://mcmap.net/q/481094/-np-random-choice-probabilities-do-not-sum-to-1 to the question pointed by @Pychopath, I have found the following solution, inspired by the documentation of numpy.random.multinomial https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.multinomial.html

Say p is the array of probabilities which may not be exactly 1 due to roundoff errors, even if we normalized it with p = p/np.sum(p). This is not rare, see the comment by @pd shah at the answer https://mcmap.net/q/481094/-np-random-choice-probabilities-do-not-sum-to-1.

Just do

p[-1] = 1 - np.sum(p[0:-1])
np.random.choice(x, p = p)

And the problem is solved! The roundoff errors due to subtraction will be much smaller than roundoff errors due to normalization. Moreover, one need not worry about the changes in p, they are of the order of roundoff errors.

Enravish answered 8/3, 2022 at 19:13 Comment(3)

Better to use p[-1] = max(0, 1 - np.sum(p[0:-1])) because sometimes roundoff errors will cause that final number to be negative (like -1e-16) which will also np.random.choice to fail, but with ValueError: probabilities are not non-negative – Geniculate 14/9, 2022 at 22:7

Oh well... here's the source code generating that error, but I'm not sure what's the best way to fix the problem, or why numpy hasn't yet fixed it. github.com/numpy/numpy/blob/main/numpy/random/mtrand.pyx – Lampe 15/1, 2023 at 6:38

OK, my problem seems to have been resolved with: p = np.array(p, dtype=numpy.float64), i.e. type conversion. I was using a Jax array. My bad. – Lampe 15/1, 2023 at 6:56

According to NumPy documentation we have to use p1-D array-like. So i think if u-array is array of probabilities then you can try it:

x, u = np.array([x_1, x_2, ..., x_n]), np.array([u_1, ..., u_n])
np.random.choice(x, p = u)

x, u = np.array([x_1, x_2, ..., x_n]), np.array([u_1, ..., u_n])
s = sum(u)
u1 = [i/s for i in u]
np.random.choice(x, p = u1)

Woolgrower answered 25/2, 2022 at 7:55 Comment(1)

This does not answer my question. The second code is almots the same one I posted. I am worried about the cumulative errors occuring due to finite precision arithmetic during divisions. It may lead to sum of probabilities not being equal to (exactly) 1. – Isaacs 3/3, 2022 at 20:20

Recommended topics

Hot tags