Multivariate kernel density estimation in Python
Asked Answered
A

2

18

I am trying to use SciPy's gaussian_kde function to estimate the density of multivariate data. In my code below I sample a 3D multivariate normal and fit the kernel density but I'm not sure how to evaluate my fit.

import numpy as np
from scipy import stats

mu = np.array([1, 10, 20])
sigma = np.matrix([[4, 10, 0], [10, 25, 0], [0, 0, 100]])
data = np.random.multivariate_normal(mu, sigma, 1000)
values = data.T
kernel = stats.gaussian_kde(values)

I saw this but not sure how to extend it to 3D.

Also not sure how do I even begin to evaluate the fitted density? How do I visualize this?

Antiknock answered 20/2, 2014 at 20:14 Comment(0)
D
35

There are several ways you might visualize the results in 3D.

The easiest is to evaluate the gaussian KDE at the points that you used to generate it, and then color the points by the density estimate.

For example:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

mu=np.array([1,10,20])
sigma=np.matrix([[4,10,0],[10,25,0],[0,0,100]])
data=np.random.multivariate_normal(mu,sigma,1000)
values = data.T

kde = stats.gaussian_kde(values)
density = kde(values)

fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
x, y, z = values
ax.scatter(x, y, z, c=density)
plt.show()

enter image description here

If you had a more complex (i.e. not all lying in a plane) distribution, then you might want to evaluate the KDE on a regular 3D grid and visualize isosurfaces (3D contours) of the volume. It's easiest to use Mayavi for the visualiztion:

import numpy as np
from scipy import stats
from mayavi import mlab

mu=np.array([1,10,20])
# Let's change this so that the points won't all lie in a plane...
sigma=np.matrix([[20,10,10],
                 [10,25,1],
                 [10,1,50]])

data=np.random.multivariate_normal(mu,sigma,1000)
values = data.T

kde = stats.gaussian_kde(values)

# Create a regular 3D grid with 50 points in each dimension
xmin, ymin, zmin = data.min(axis=0)
xmax, ymax, zmax = data.max(axis=0)
xi, yi, zi = np.mgrid[xmin:xmax:50j, ymin:ymax:50j, zmin:zmax:50j]

# Evaluate the KDE on a regular grid...
coords = np.vstack([item.ravel() for item in [xi, yi, zi]])
density = kde(coords).reshape(xi.shape)

# Visualize the density estimate as isosurfaces
mlab.contour3d(xi, yi, zi, density, opacity=0.5)
mlab.axes()
mlab.show()

enter image description here

Dulin answered 20/2, 2014 at 20:32 Comment(4)
Thanks joe. This was very helpful. Do you know if the function is robust to handle missing data points ?Antiknock
How to get the mean value of "kde"?Collateral
The density = kde(values) line from the first example produces the error LinAlgError: 1-th leading minor of the array is not positive definiteInfract
Elegant explanation! Thanks Joe!Minimal
H
2

@Joe answer is great. But the covariance matrix sigma=np.matrix([[4,10,0],[10,25,0],[0,0,100]]) he gave is not positive definite, and therefore numpy in python can not do cholesky decomposition for it.

In fact, the covariance matrix must be positive semi-definite because the variance is nonnegative forever. Users could trysigma=np.matrix([[4,0,0],[0,25,0],[0,0,100]]) instead.

Howzell answered 20/2, 2023 at 7:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.