scikit-learn GMM produce positive log probability

About

Asked 29/8, 2012 at 10:1 Answered 30/8, 2012 at 14:20

Solved python machine-learning scikit-learn mixture-model

I am using Gaussian Mixture Model from python scikit-learn package to train my dataset , however , I fount that when I code

-- G=mixture.GMM(...)

-- G.fit(...)

-- G.score(sum feature)

the resulting log probability is positive real number... why is that? isn't log probability guaranteed to be negative?

I get it. what Gaussian Mixture Model returns to us i the log probability "density" instead of probability "mass" so positive value is totally reasonable.

If the covariance matrix is near to singular, then the GMM will not perfomr well, and generally it means the data is not good for such generative task

Martsen answered 29/8, 2012 at 10:1 Comment(1)

It sounds like a bug, can you please give a minimalistic reproduction script? BTW: you can report bugs directly on github.com/scikit-learn/scikit-learn/issues – Par 29/8, 2012 at 10:10

Positive log probabilities are okay.

Remember that the GMM computed probability is a probability density function (PDF), so can be greater than one at any individual point.

The restriction is that the PDF must integrate to one over the data domain.

If the log probability grows very large, then the inference algorithm may have reached a degenerate solution (common with maximum likelihood estimation if you have a small dataset).

To check that the GMM algorithm has not reached a degenerate solution, you should look at the variances for each component. If any of the variances is close to zero, then this is bad. As an alternative, you should use a Bayesian model rather than maximum likelihood estimation (if you aren't doing so already).

Jaggers answered 30/8, 2012 at 14:20 Comment(2)

Hi, thank you for the reply , can you explain more on degenerated covariance matrix? how can that happen?. does that mean my data lie mainly on some subspace of R^n so that the variance along some axis is close to zero ? – Martsen 31/8, 2012 at 1:0

Yes - your data could span a lower dimensional subspace or one of the mixture components could be centered on a single data point. Check to see if any eigenvalues of the covariance matrices are close to zero. – Jaggers 31/8, 2012 at 2:11

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags