Efficiently sample from arbitrary multivariate function

I would like to sample from an arbitrary function in Python.

In Fast arbitrary distribution random sampling it was stated that one could use inverse transform sampling and in Pythonic way to select list elements with different probability it was mentioned that one should use inverse cumulative distribution function. As far as I undestand those methods only work the univariate case. My function is multivariate though and too complex that any of the suggestions in https://mcmap.net/q/455337/-sampling-from-a-multivariate-probability-density-function-in-python would apply.

Prinliminaries: My function is based on Rosenbrock's banana function, which value we can get the value of the function with

import scipy.optimize
scipy.optimize.rosen([1.1,1.2])

(here [1.1,1.2] is the input vector) from scipy, see https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.optimize.rosen.html.

Here is what I came up with: I make a grid over my area of interest and calculate for each point the function value. Then I sort the resulting data frame by the value and make a cumulative sum. This way we get "slots" which have different sizes - points which have large function values have larger slots than points with small function values. Now we generate random values and look into which slot the random value falls into. The row of the data frame is our final sample.

Here is the code:

import scipy.optimize
from itertools import product
from dfply import *

nb_of_samples = 50
nb_of_grid_points = 30

rosen_data = pd.DataFrame(array([item for item in product(*[linspace(fm[0], fm[1], nb_of_grid_points) for fm in zip([-2,-2], [2,2])])]), columns=['x','y'])
rosen_data['z'] = [np.exp(-scipy.optimize.rosen(row)**2/500) for index, row in rosen_data.iterrows()]
rosen_data = rosen_data >> \
    arrange(X.z) >> \
    mutate(z_upperbound=cumsum(X.z)) >> \
    mutate(z_upperbound=X.z_upperbound/np.max(X.z_upperbound))
value = np.random.sample(1)[0]

def get_rosen_sample(value):
    return (rosen_data >> mask(X.z_upperbound >= value) >> select(X.x, X.y)).iloc[0,]

values = pd.DataFrame([get_rosen_sample(s) for s in np.random.sample(nb_of_samples)])

This works well, but I don't think it is very efficient. What would be a more efficient solution to my problem?

I read that Markov chain Monte Carlo might helping, but here I am in over my head for now on how to do this in Python.

I was in a similar situation, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method) to sample from a bivariate distribution. An example follows.

Say, we want to sample from the following denisty:

def density1(z):
    z = np.reshape(z, [z.shape[0], 2])
    z1, z2 = z[:, 0], z[:, 1]
    norm = np.sqrt(z1 ** 2 + z2 ** 2)
    exp1 = np.exp(-0.5 * ((z1 - 2) / 0.8) ** 2)
    exp2 = np.exp(-0.5 * ((z1 + 2) / 0.8) ** 2)
    u = 0.5 * ((norm - 4) / 0.4) ** 2 - np.log(exp1 + exp2)
    return np.exp(-u)

which looks like this

The following function implements MH with multivariate normal as the proposal

def metropolis_hastings(target_density, size=500000):
    burnin_size = 10000
    size += burnin_size
    x0 = np.array([[0, 0]])
    xt = x0
    samples = []
    for i in range(size):
        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
        accept_prob = (target_density(xt_candidate))/(target_density(xt))
        if np.random.uniform(0, 1) < accept_prob:
            xt = xt_candidate
        samples.append(xt)
    samples = np.array(samples[burnin_size:])
    samples = np.reshape(samples, [samples.shape[0], 2])
    return samples

Run MH and plot samples

samples = metropolis_hastings(density1)
plt.hexbin(samples[:,0], samples[:,1], cmap='rainbow')
plt.gca().set_aspect('equal', adjustable='box')
plt.xlim([-3, 3])
plt.ylim([-3, 3])
plt.show()

Check out this repo of mine for details.

Recommended topics

Hot tags