I fit a model using scikit-learn NMF model on my training data. Now I perform an inverse transform of new data using
result_1 = model.inverse_transform(model.transform(new_data))
Then I compute the inverse transform of my data manually taking the components from the NMF model, using the equation as in Slide 15 here.
temp = np.dot(model.components_, model.components_.T)
transform = np.dot(np.dot(model.components_.T, np.linalg.pinv(temp)),
model.components_)
result_2 = np.dot(new_data, transform)
I would like to understand why the 2 results do not match. What am I doing wrong while computing the inverse transform and reconstructing the data?
Sample code:
import numpy as np
from sklearn.decomposition import NMF
data = np.array([[0,0,1,1,1],[0,1,1,0,0],[0,1,0,0,0],[1,0,0,1,0]])
print(data)
//array([[0, 0, 1, 1, 1],
[0, 1, 1, 0, 0],
[0, 1, 0, 0, 0],
[1, 0, 0, 1, 0]])
model = NMF(alpha=0.0, init='random', l1_ratio=0.0, max_iter=200, n_components=2, random_state=0, shuffle=False, solver='cd', tol=0.0001, verbose=0)
model.fit(data)
NMF(alpha=0.0, beta_loss='frobenius', init='random', l1_ratio=0.0,
max_iter=200, n_components=2, random_state=0, shuffle=False, solver='cd',
tol=0.0001, verbose=0)
new_data = np.array([[0,0,1,0,0], [1,0,0,0,0]])
print(new_data)
//array([[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0]])
result_1 = model.inverse_transform(model.transform(new_data))
print(result_1)
//array([[ 0.09232497, 0.38903892, 0.36668712, 0.23067627, 0.1383513 ],
[ 0.0877082 , 0. , 0.12131779, 0.21914115, 0.13143295]])
temp = np.dot(model.components_, model.components_.T)
transform = np.dot(np.dot(model.components_.T, np.linalg.pinv(temp)), model.components_)
result_2 = np.dot(new_data, transform)
print(result_2)
//array([[ 0.09232484, 0.389039 , 0.36668699, 0.23067595, 0.13835111],
[ 0.09193481, -0.05671439, 0.09232484, 0.22970145, 0.13776664]])
Note: Although this is not the best data describing my issue, the code is essentially the same. Also result_1
and result_2
are much more different from each other in the actual case. data
and new_data
are also large arrays.
return np.dot(W, self.components_)
– Ppmresult_1
andresult_2
? Do you expect them to be exactly equal? Or equal within machine accuracy? or equal within some specified error? – Deck