I would like to be able to generate random numbers with a probability density function that comes from a drawn curve. These two below have the same area under the curve but should produce lists of random numbers with different characteristics.
My intuition is that one way would be to do it is to sample the curve, and then use the areas of those rectangles to feed an np.random.choice
to pick a range to do an ordinary random in the range of that rectangle's range.
This doesn't feel like a very efficient way to do it. Is there a more 'correct' way to do it?
I had a crack at actually doing it:
import matplotlib.pyplot as plt
import numpy as np
areas = [4.397498, 4.417111, 4.538467, 4.735034, 4.990129, 5.292455, 5.633938,
6.008574, 6.41175, 5.888393, 2.861898, 2.347887, 2.459234, 2.494357,
2.502986, 2.511614, 2.520243, 2.528872, 2.537501, 2.546129, 7.223747,
7.223747, 2.448148, 1.978746, 1.750221, 1.659351, 1.669999]
divisons = [0.0, 0.037037, 0.074074, 0.111111, 0.148148, 0.185185, 0.222222,
0.259259, 0.296296, 0.333333, 0.37037, 0.407407, 0.444444, 0.481481,
0.518519, 0.555556, 0.592593, 0.62963, 0.666667, 0.703704, 0.740741,
0.777778, 0.814815, 0.851852, 0.888889, 0.925926, 0.962963, 1.0]
weights = [a/sum(areas) for a in areas]
indexes = np.random.choice(range(len(areas)), 50000, p=weights)
samples = []
for i in indexes:
samples.append(np.random.uniform(divisons[i], divisons[i+1]))
binwidth = 0.02
binSize = np.arange(min(samples), max(samples) + binwidth, binwidth)
plt.hist(samples, bins=binSize)
plt.xlim(xmax=1)
plt.show()
The method seems to work, but is a bit heavy!