Reconstructing new data using sklearn NMF components Vs inverse_transform does not match

import numpy as np from sklearn.decomposition import NMF data = np.array([[0,0,1,1,1],[0,1,1,0,0],[0,1,0,0,0],[1,0,0,1,0]]) print(data) //array([[0, 0, 1, 1, 1], [0, 1, 1, 0, 0], [0, 1, 0, 0, 0], [1, 0, 0, 1, 0]]) model = NMF(alpha=0.0, init='random', l1_ratio=0.0, max_iter=200, n_components=2, random_state=0, shuffle=False, solver='cd', tol=0.0001, verbose=0) model.fit(data) NMF(alpha=0.0, beta_loss='frobenius', init='random', l1_ratio=0.0, max_iter=200, n_components=2, random_state=0, shuffle=False, solver='cd', tol=0.0001, verbose=0) new_data = np.array([[0,0,1,0,0], [1,0,0,0,0]]) print(new_data) //array([[0, 0, 1, 0, 0], [1, 0, 0, 0, 0]]) result_1 = model.inverse_transform(model.transform(new_data)) print(result_1) //array([[ 0.09232497, 0.38903892, 0.36668712, 0.23067627, 0.1383513 ], [ 0.0877082 , 0. , 0.12131779, 0.21914115, 0.13143295]]) temp = np.dot(model.components_, model.components_.T) transform = np.dot(np.dot(model.components_.T, np.linalg.pinv(temp)), model.components_) result_2 = np.dot(new_data, transform) print(result_2) //array([[ 0.09232484, 0.389039 , 0.36668699, 0.23067595, 0.13835111], [ 0.09193481, -0.05671439, 0.09232484, 0.22970145, 0.13776664]])

What happens

In scikit-learn, NMF does more than simple matrix multiplication: it optimizes!

Decoding (inverse_transform) is linear: the model calculates X_decoded = dot(W, H), where W is the encoded matrix, and H=model.components_ is a learned matrix of model parameters.

Encoding (transform), however, is nonlinear : it performs W = argmin(loss(X_original, H, W)) (with respect to W only), where loss is mean squared error between X_original and dot(W, H), plus some additional penalties (L1 and L2 norms of W), and with the constraint that W must be non-negative. Minimization is performed by coordinate descent, and result may be nonlinear in X_original. Thus, you cannot simply get W by multiplying matrices.

Why it is so weird

NMF has to perform such strange calculations because, otherwise, the model may produce negative results. Indeed, in your own example, you could try to perform transform by matrix multiplication

 print(np.dot(new_data, np.dot(model.components_.T, np.linalg.pinv(temp))))

and get the result W that contains negative numbers:

[[ 0.17328927  0.39649966]
 [ 0.1725572  -0.05780202]]

However, the coordinate descent within NMF avoids this problem by slightly modifying the matrix:

 print(model.transform(new_data))

gives a non-negative result

[[0.17328951 0.39649958]
 [0.16462405 0.        ]]

You can see that it does not simply clip W matrix from below, but modifies the positive elements as well, in order to improve the fit (and obey the regularization penalties).

What happens

Why it is so weird

Recommended topics

Hot tags