scikit-learn GMM produce positive log probability
Asked Answered
M

1

7

I am using Gaussian Mixture Model from python scikit-learn package to train my dataset , however , I fount that when I code

-- G=mixture.GMM(...)

-- G.fit(...)

-- G.score(sum feature)

the resulting log probability is positive real number... why is that? isn't log probability guaranteed to be negative?

I get it. what Gaussian Mixture Model returns to us i the log probability "density" instead of probability "mass" so positive value is totally reasonable.

If the covariance matrix is near to singular, then the GMM will not perfomr well, and generally it means the data is not good for such generative task

Martsen answered 29/8, 2012 at 10:1 Comment(1)
It sounds like a bug, can you please give a minimalistic reproduction script? BTW: you can report bugs directly on github.com/scikit-learn/scikit-learn/issuesPar
J
13

Positive log probabilities are okay.

Remember that the GMM computed probability is a probability density function (PDF), so can be greater than one at any individual point.

The restriction is that the PDF must integrate to one over the data domain.

If the log probability grows very large, then the inference algorithm may have reached a degenerate solution (common with maximum likelihood estimation if you have a small dataset).

To check that the GMM algorithm has not reached a degenerate solution, you should look at the variances for each component. If any of the variances is close to zero, then this is bad. As an alternative, you should use a Bayesian model rather than maximum likelihood estimation (if you aren't doing so already).

Jaggers answered 30/8, 2012 at 14:20 Comment(2)
Hi, thank you for the reply , can you explain more on degenerated covariance matrix? how can that happen?. does that mean my data lie mainly on some subspace of R^n so that the variance along some axis is close to zero ?Martsen
Yes - your data could span a lower dimensional subspace or one of the mixture components could be centered on a single data point. Check to see if any eigenvalues of the covariance matrices are close to zero.Jaggers

© 2022 - 2024 — McMap. All rights reserved.