Sampling from a multivariate probability density function in python
Asked Answered
A

1

2

I have a multivariate probability density function P(x,y,z), and I want to sample from it. Normally, I would use numpy.random.choice() for this sort of task, but this function only works for 1-dimensional probability densities. Is there an equivalent function for multivariate PDFs?

Acquire answered 8/2, 2018 at 1:13 Comment(0)
S
2

There a few different paths one can follow here.

(1) If P(x,y,z) factors as P(x,y,z) = P(x) P(y) P(z) (i.e., x, y, and z are independent) then you can sample each one separately.

(2) If P(x,y,z) has a more general factorization, you can reduce the number of variables that need to be sampled to whatever's conditional on the others. E.g. if P(x,y,z) = P(z|x, y) P(y | x) P(x), then you can sample x, y given x, and z given y and x in turn.

(3) For some particular distributions, there are known ways to sample. E.g. for multivariate Gaussian, if x is a sample from a mean 0 and identity covariance Gaussian (i.e. just sample each x_i as N(0, 1)), then y = L x + m has mean m and covariance S = L L' where L is the lower-triangular Cholesky decomposition of S, which must be positive definite.

(4) For many multivariate distributions, none of the above apply, and a more complicated scheme such as Markov chain Monte Carlo is applied.

Maybe if you say more about the problem, more specific advice can be given.

Springspringboard answered 8/2, 2018 at 1:49 Comment(8)
Hi Robert, thanks for your help. My x, y, and z are not independent, and they are not known distributions. I generated the pdf using a kernal density estimate applied to data points on a sphere surface. Therefore, x, y, and z are equal to x^2+y^2+z^2 = r^2Acquire
OK. A kernel density estimate is an example of a finite mixture distribution. One can sample from a mixture distribution by choosing a mixture component at random, according to its mixture weight, and then sampling from the chosen component. For a kernel density estimate, typically all weights are equal to 1/n where n = #data, and the mixture components are typically something simple like a Gaussian bump. Are the per-datum bumps 2 d (lying in the sphere's surface) or 3 d (extending above and below the surface)?Springspringboard
my per-datum bumps are actually 2d. I can actually reduce my pdf to a function of just theta and phi since I am dealing with a constant radius.Acquire
Given that, sampling from the kernel density is equivalent to selecting a datum at random (with equal probability) and then sampling from whatever the kernel is, centered on that datum. I don't know if the kernel takes the non-flat surface into account already. If the kernel width is small compared to the size of the sphere, you can probably ignore curvature and assume that you can sample as if the kernel were on a flat surface. What are the kernels like? By the way, it sounds like a good problem.Springspringboard
I'm a little confused. My kde is a function of theta and phi. I don't see why the kernel needs to take into account the non-flat surface, as the spherical surface is not really relevant in creating the kde; it's just a property of my data. Can you explain how you would randomly sample from such a distribution? (and thanks, I think it's a good problem too!)Acquire
Well, it may well be that taking curvature into account won't make any appreciable difference for your problem. That said, my point is that being on the sphere changes the properties of the density at least a little. For example, on a plane, there are points that are arbitrarily far away, but on a sphere, no point is more than pi times radius away. See for example the von Mises distribuion.Springspringboard
All I'm really trying to do is randomly sample this 2d pdf. As in, I want to generate a set of thetas and phis that are distributed according to the 2d pdf. I generated the kde using data from a simulation that involves points on a sphere, but I think that will not affect the way I actually do the sampling, since in the end I am just left with a 2d pdf. Do you know of a simple way to do a sampling of a 2d pdf in Python?Acquire
There are several methods for sampling, and which one is appropriate for a given problem depends on details. However, in this case there is a very simple sampling method available. The method is just this: pick one kernel center at random (this is probably equivalent to picking one data point at random), and "smudge" the point by sampling from the kernel centered at the data point. Most often, the kernels are Gaussian bumps, for which sampling is very simple. There's no need to look for a general method, because there is a simple specific method for this problem.Springspringboard

© 2022 - 2024 — McMap. All rights reserved.