ALS algorithm in Dask optimization

I am trying to implement ALS algorithm in Dask, but I am having trouble figuring out how to compute latent feautures in one step. I followed formulas on this stackoverflow thread and come up with this code:

    Items = da.linalg.lstsq(da.add(da.dot(Users, Users.T), lambda_ * da.eye(n_factors)), 
                            da.dot(Users, X))[0].T.compute()
    Items = np.where(Items < 0, 0, Items)

    Users = da.linalg.lstsq(da.add(da.dot(Items.T, Items), lambda_ * da.eye(n_factors)), 
                            da.dot(Items.T, X.T))[0].compute()
    Users = np.where(Users < 0, 0, Users)

But I don't think this works correctly, because MSE is not decreasing.

Example input:

n_factors = 2
lambda_ = 0.1
# We have 6 users and 4 items

Matrix X_train(6x4), R(4x6), Users(2x6) and Items(4x2) looks like:

1  0  0  0  5  2        1 0 0 0    0.8  1.3     1.1  0.2  4.1  1.6
0  0  0  0  4  0        0 0 1 1    3.9  4.3     3.5  2.7  4.3  0.5
0  3  0  0  4  0        0 0 0 0    2.9  1.5
0  3  0  0  0  0        0 0 0 0    0.2  4.7
                        1 1 1 0    0.9  1.1
                        1 0 0 0    4.8  3.0

EDIT: I found the problem, but I don't know how to get around it. Before the iteration starts I set all values in X_train matrix, where there is no rating, to 0.

X_train = da.nan_to_num(X_train)

Reason for that is because dot product works only on numeric values. But because the matrix is very sparse 90% of it now consists of zeros. And insted of fiting real ratings in the matrix it fits this zeros.

Any help would be highly appreciated. <3

data_set = da.array([[1, 2], [3, 4]]) masked_data_set_1 = da.ma.masked_array(data_set, mask=[[False, True],[True, False]]) # returns [[1, --],[--, 4]] masked_data_set_2 = da.ma.masked_equal(data_set, 4) # returns [[1, 2],[3, --]] masked_data_set_3 = da.ma.masked_where(data_set < 3, data_set) # returns [[--, --],[3, 4]]

Recommended topics

Hot tags