numpy 1.9.0: ValueError: probabilities do not sum to 1
Asked Answered
P

1

13

I have a large code that at one point samples values from an array according to the probabilities taken from a probability density function (PDF).

To do this I use the numpy.random.choice which worked just fine until numpy 1.8.0. Here's a MWE (the file pdf_probs.txt can be downloaded here):

import simplejson
import numpy as np

# Read probabilities from file.
f = open('pdf_probs.txt', 'r')
probs = simplejson.load(f)
f.close()

print sum(probs)  # <-- Not *exactly* 1. but very close: 1.00000173042
# Define array.
arr = np.linspace(1., 100., len(probs))

# Get samples using the probabilities in probs.
samples = np.random.choice(arr, size=1000, replace=True, p=probs)

The thing is that after testing it with numpy 1.9.0 the above code fails with the error:

Traceback (most recent call last):
  File "numpy_180_vs_190_np_random_choice.py", line 13, in <module>
    samples = np.random.choice(arr, size=1000, replace=True, p=probs)
  File "mtrand.pyx", line 1083, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:10106)
ValueError: probabilities do not sum to 1

The sum of the PDF probabilities will not sum to exactly 1. given the small deviations that appear when using very small floats.

From what I can gather the previous version of numpy (1.8.0) apparently had a larger tolerance than the new 1.9.0 version, but I could be wrong.

Why does this work with numpy 1.8.0 but not with 1.9.0? How can I make my code work with the new 1.9.0 version?

Pardner answered 23/9, 2014 at 0:35 Comment(1)
Here's the change to the tolerance testing: 1.9 version vs 1.8 version.Corking
S
21

I think 1.7e-6 is a large enough relative error to be worth complaining about. You can renormalize easily enough, though, if you're confident the error is negligible:

>>> probs = np.array(probs)
>>> probs /= probs.sum()
>>> probs.sum()
1.0
>>> samples = np.random.choice(arr, size=1000, replace=True, p=probs)
>>> samples[:5]
array([  1.37635054,   1.1287515 ,   1.7229892 ,  19.8967587 ,   2.07953181])
Squid answered 23/9, 2014 at 0:50 Comment(5)
Thanks @DSM, that's a very simple solution that I didn't think of. Do you have any idea what changed from 1.8.0 to 1.9.0 to make the code no longer work?Pardner
This isn't working for me, my probabilities are large integers. When I go through this step robs /= probs.sum() it just creates an array of 0's, so my sum() is zeroBarney
@Barney That's a result of integer division. If you do probs /= probs.sum().astype(float), you should be fine.Has
Just a note for anyone still having trouble. Similar to above, set the data type (dtype) of the source array to np.float64, not 32bit float and obviously not int. With 32 bit float, you can have an error of 1e-7 when you normalize (divide by the sum). This is large enough of an error to cause numpy to raise the exception.Delanie
For those with nested arrays, like np.array([[0.4, 0.5], [0.3, 0.7]]), axes have to be used to broadcast and compute it correctly: probs /= probs.sum(axis=1).astype(float)[:, np.newaxis] I just wanted to add this, as I had to search and test some more to get it to work in my code.Influence

© 2022 - 2024 — McMap. All rights reserved.