I am trying to fit GaussianMixture using sklearn to a bunch of cat and dog pictures. I feed a numpy array of size (50,30000) where 50 number of data points(25 cats and 25 dog pictures), 30000 is the number of features after I convert each picture to numpy array and resize to (100,100,3). It is throwing memory error. I have 4GB of RAM and 70% used before running this code. Can anyone suggest me how to debug how much memory is used by GaussianMixture fit method in sklearn. Or can anyone provide some code to fit it in batches.
Following is the code
print(img_coll_cat_dog.shape)
print(img_coll_cat_dog.nbytes)
print(img_coll_cat_dog.itemsize)
Result:
(50, 30000)
12000000 bytes
8
gmix = mixture.GaussianMixture(n_components=2, covariance_type='full')
gmix.fit(img_coll_cat_dog)
Following is the error I am getting.
MemoryError Traceback (most recent call last)
<ipython-input-32-c0370476a619> in <module>()
1 gmix = mixture.GaussianMixture(n_components=2, covariance_type='full')
----> 2 gmix.fit(img_coll_cat_dog)
~/dl/dl3/lib/python3.5/site-packages/sklearn/mixture/base.py in fit(self, X, y)
205
206 if do_init:
--> 207 self._initialize_parameters(X, random_state)
208 self.lower_bound_ = -np.infty
209
~/dl/dl3/lib/python3.5/site-packages/sklearn/mixture/base.py in _initialize_parameters(self, X, random_state)
155 % self.init_params)
156
--> 157 self._initialize(X, resp)
158
159 @abstractmethod
~/dl/dl3/lib/python3.5/site-packages/sklearn/mixture/gaussian_mixture.py in _initialize(self, X, resp)
629
630 weights, means, covariances = _estimate_gaussian_parameters(
--> 631 X, resp, self.reg_covar, self.covariance_type)
632 weights /= n_samples
633
~/dl/dl3/lib/python3.5/site-packages/sklearn/mixture/gaussian_mixture.py in _estimate_gaussian_parameters(X, resp, reg_covar, covariance_type)
283 "diag": _estimate_gaussian_covariances_diag,
284 "spherical": _estimate_gaussian_covariances_spherical
--> 285 }[covariance_type](resp, X, nk, means, reg_covar)
286 return nk, means, covariances
287
~/dl/dl3/lib/python3.5/site-packages/sklearn/mixture/gaussian_mixture.py in _estimate_gaussian_covariances_full(resp, X, nk, means, reg_covar)
162 """
163 n_components, n_features = means.shape
--> 164 covariances = np.empty((n_components, n_features, n_features))
165 for k in range(n_components):
166 diff = X - means[k]
MemoryError:
Any help is much appreciated.