I would like to sample from an arbitrary function in Python.
In Fast arbitrary distribution random sampling it was stated that one could use inverse transform sampling and in Pythonic way to select list elements with different probability it was mentioned that one should use inverse cumulative distribution function. As far as I undestand those methods only work the univariate case. My function is multivariate though and too complex that any of the suggestions in https://mcmap.net/q/455337/-sampling-from-a-multivariate-probability-density-function-in-python would apply.
Prinliminaries: My function is based on Rosenbrock's banana function, which value we can get the value of the function with
import scipy.optimize
scipy.optimize.rosen([1.1,1.2])
(here [1.1,1.2]
is the input vector) from scipy, see https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.rosen.html.
Here is what I came up with: I make a grid over my area of interest and calculate for each point the function value. Then I sort the resulting data frame by the value and make a cumulative sum. This way we get "slots" which have different sizes - points which have large function values have larger slots than points with small function values. Now we generate random values and look into which slot the random value falls into. The row of the data frame is our final sample.
Here is the code:
import scipy.optimize
from itertools import product
from dfply import *
nb_of_samples = 50
nb_of_grid_points = 30
rosen_data = pd.DataFrame(array([item for item in product(*[linspace(fm[0], fm[1], nb_of_grid_points) for fm in zip([-2,-2], [2,2])])]), columns=['x','y'])
rosen_data['z'] = [np.exp(-scipy.optimize.rosen(row)**2/500) for index, row in rosen_data.iterrows()]
rosen_data = rosen_data >> \
arrange(X.z) >> \
mutate(z_upperbound=cumsum(X.z)) >> \
mutate(z_upperbound=X.z_upperbound/np.max(X.z_upperbound))
value = np.random.sample(1)[0]
def get_rosen_sample(value):
return (rosen_data >> mask(X.z_upperbound >= value) >> select(X.x, X.y)).iloc[0,]
values = pd.DataFrame([get_rosen_sample(s) for s in np.random.sample(nb_of_samples)])
This works well, but I don't think it is very efficient. What would be a more efficient solution to my problem?
I read that Markov chain Monte Carlo might helping, but here I am in over my head for now on how to do this in Python.