How can I estimate gaussian (mixture) density from a set of weighted samples? [closed]

Asked 22/3, 2010 at 13:48 Answered 29/10, 2020 at 20:50

statistics machine-learning estimation gaussian

Assume I have a set of weighted samples, where each samples has a corresponding weight between 0 and 1. I'd like to estimate the parameters of a gaussian mixture distribution that is biased towards the samples with higher weight. In the usual non-weighted case gaussian mixture estimation is done via the EM algorithm.

Is there an implementation (any language is OK) that permits passing weights? If not, how can I modify the algorithm to account for the weights? If not, how to incorporate the weights in the initial formula of the maximum-log-likelihood formulation of the problem?

Haag answered 22/3, 2010 at 13:48 Comment(5)

Is "EM" error minimization, or something else entirely? Also, there are many numeric and analysis packages ranging for basic and general to highly specialized. It might help if you said something about your problem domain and preferred environment. Fortran? C++? Java? Python? Are you OK learning a major new tool like R or root? – Burgh 22/3, 2010 at 14:40

Ok, then my preferred language would be Python. But any of the above languages except root (never heard of it) would also be ok. EM stands for Estimation Maximization and is general iterative scheme that can be used for estimation of the parameters of a gaussian mixture model from data. – Haag 22/3, 2010 at 14:56

I'm not familiar with that method and can't make any specific recommendations. – Burgh 22/3, 2010 at 19:16

Try asking on math.stackexchange.com . This looks more like a Math question than a coding question to me. – Arch 29/8, 2010 at 14:24

Did anyone get to implement this? I'm struggling with the same problem. Actually, I'm trying an implementation that I found in a research paper but it's not working (the program usually ends up with a singular covariance matrix) – Undersexed 2/5, 2016 at 10:28

I've just had the same problem. Even though the post is older, it might be interesting to someone else. honk's answer is in principle correct, it's just not immediate to see how it affects the implementation of the algorithm. From the Wikipedia article for Expectation Maximization and a very nice Tutorial, the changes can be derived easily.

If $v_i$ is the weight of the i-th sample, the algorithm from the tutorial (see end of Section 6.2.) changes so that the $gamma_{ij}$ is multiplied by that weighting factor. For the calculation of the new weights $w_j$, $n_j$ has to be divided by the sum of the weights $\sum_{i=1}^{n} v_i$ instead of just n. That's it...

Tilney answered 24/3, 2011 at 16:7 Comment(0)

You can calculate a weighted log-Likelihood function; just multiply the every point with it's weight. Note that you need to use the log-Likelihood function for this.

So your problem reduces to minimizing $-\ln L = \sum_i w_i \ln f(x_i|q)$ (see the Wikipedia article for the original form).

Canorous answered 16/7, 2010 at 22:34 Comment(0)

Just a suggestion as no other answers are sent.

You could use the normal EM with GMM (OpenCV for ex. has many wrappers for many languages) and put some points twice in the cluster you want to have "more weight". That way the EM would consider those points more important. You can remove the extra points later if it does matter.

Otherwise I think this goes quite extreme mathematics unless you have strong background in advanced statistics.

Cockshy answered 16/7, 2010 at 21:59 Comment(1)

This doesn't work when you either have a lot of points or intrinsically non-integer weights. As it happened to me to have both: a histogram of millions of points with non-integer weights... – Combat 9/11, 2011 at 21:26

I was looking for a similar solution related to gaussian kernel estimation (instead of a gaussian mixture) of the distribution.

The standard gaussian_kde does not allow that but I found a python implementation of a modified version here http://mail.scipy.org/pipermail/scipy-user/2013-May/034580.html

Alicaalicante answered 25/8, 2014 at 9:45 Comment(0)

I think this analysis can be possibly be done via the pomegranate (see Pomegranate docs page) that supports a weighted Gaussian Mixture Modeling.

According to their doc:

weights : array-like, shape (n_samples,), optional The initial weights of each sample in the matrix. If nothing is passed in then each sample is assumed to be the same weight. Default is None.

Here is a Python snippet I wrote that can possibly help you do a weighted GMM:

from pomegranate import *
import numpy as np

# Generate some data
N = 200
X_vals= np.random.normal(-17, 0.9, N).reshape(-1,1) # Needs to be in Nx1 shape
X_weights = w_function(X_vals) # Needs to be in 1xN shape or alternatively just feed in the weight data you have

pmg_model = GeneralMixtureModel.from_samples([NormalDistribution], 2, X_vals, weights=X_weights.T[0])

[Figure] Observed versus weighted distribution of the data we are analyzing

[Figure] GMM of the weighted data

Herrera answered 29/10, 2020 at 20:50 Comment(0)

Recommended topics

Hot tags