Mahout: how to make recommendations for new users
Asked Answered
T

1

6

We plan to use Mahout for a movie recommendation system. And we also plan to use SVD for model building.

When a new user comes we will require him/her to rate a certain number of movies (say 10).

The problem is that, in order to make a recommendation to this new user we have to rebuild the entire model again.

Is there a better way to this?

Thanks

Tracheitis answered 12/10, 2012 at 11:4 Comment(0)
T
9

Yes... though not in Mahout. The implementations there are by nature built around periodic reloading and rebuilding of a data model. In some implementations this still lets you use new data on the fly, like neighborhood-based implementations. I don't think the SVD-based in-memory one does this (I didn't write it.)

In theory, you can start making recommendations from the very first click or rating, by projecting the target item/movie back into the user-feature space via fold-in. To greatly simplify -- if your rank-k approximate factorization of input A is Ak = Uk * Sk * Vk', then for a new user u, you want a new row Uk_u for your update. You have A_u.

Uk = Ak * (Vk')^-1 * (Sk)^-1. The good news is that those two inverses on the right are trivial. (Vk')^-1 = Vk because it has orthonormal columns. (Sk)^-1 is just a matter of taking the reciprocal of Sk's diagonal elements.

So Uk_u = Ak_u * (Vk')^-1 * (Sk)^-1. You don't have Ak_u, but, you have A_u which is approximately the same, so you use that.

If you like Mahout, and like matrix factorization, I suggest you consider the ALS algorithm. It's a simpler process, so is faster (but makes the fold-in a little harder -- see the end of a recent explanation I gave). It works nicely for recommendations.

This also exists in Mahout, though the fold-in isn't implemented. Myrrix, which is where I am continuing work from Mahout, implements all of this.

Trickster answered 12/10, 2012 at 12:40 Comment(13)
Hi Sean, thank you for the nice answer. Actually your solution seems not to be difficult to implement. So, I wonder why this is not implemented in Mahout given that in commercial settings it is very important to be able to immediately recommend items to users after they give some ratings.Cordite
Nothing's implemented unless someone implements it, not because it shouldn't be. I guess nobody wanted to implement it yet.Trickster
Hi Sean again. What do you think of the technique of retraining the model only for the new user's ratings, in order to learn the factors associated with him.Cordite
We're having parallel threads on the mailing list - I think this is the same thing as fold in.Trickster
Hi Sean, this is a rather late reply, I hope you see it. In your answer above you say that "The good news is that those two inverses on the right are trivial. (Vk')^-1 = Vk because it has orthonormal columns." However, Vk is not a square matrix, doesn't it create a problem?Cordite
Moreover, when SVD is done by gradient descent (Funk SVD) then the singular values are merged into Vk. So Vk is no longer an (semi) orthogonal matrixCordite
A non-square matrix can have a pseudo-inverse, or one-sided inverse. To be more precise I mean that we're finding (Vk')^-1 such that Vk' * (Vk')^-1 = I. It's easy since Vk' * Vk = I. But Vk * Vk' != I, for sure. If you are using gradient descent you are not computing an SVD in this strict sense. Sk is part of Uk and Vk in that process and no they are not orthonormal. The right-inverse is still quite computable it's just not trivially the transpose. Do you need that formula?Trickster
By the way, I find a code which extracts the singular values from FunkSVD so folding-in works. The address is dev.grouplens.org/trac/lenskit/browser/src/main/java/org/…Cordite
Note that (Vk' * Vk) * (Vk' * Vk)^-1 = I, by definition. Just regroup: Vk' * (Vk * (Vk' * Vk)^-1) = I. (Vk * (Vk' * Vk)^-1) is the desired right inverse. (Vk' * Vk) is a small square matrix and can be inverted by any library easily. You do have to watch out for a singular matrix here; it means your data is inherently very low rank, lower than k!Trickster
Hi Sean, I have applied the right inverse method you have described. That is: Uk_u = Ak_u * (Vk')^-1 = Ak_u * (Vk * (Vk' * Vk)^-1). Everything worked fine. The recommendation lists after fold-in are reasonable. However, when I inspect the predicted ratings, all of them are below 1 (such as 0.9 or 0.8), although the rating scale is 1-to-5. What might be the reason for this? Any idea? ThanksCordite
Make sure Ak_u is in rating space -- contains a value in 1-5. The transformation is via a low-dimension space so will not preserve your input, not close. None of the original inputs are necessarily preserved, by nature. The rating gets 'spread' across many items when un-projected. You might also double-check your math to make sure that's not the issue.Trickster
Hi Sean, as a test case I tried to get the feature vector for a specific user who is already in the training set by folding-in in order to compare it with the feature vector generated by gradient descent. That is, I used the formula Uk_u = A_u * (Vk * (Vk' * Vk)^-1). The feature vector generated from folding-in is quite different from the feature vector generated by gradient descent.Cordite
... The reason seems to be that A_u is quite different from Ak_u. Above you said that they are approximately the same but actually they are not because one of them, A_u, is a very sparse vector (most of it is empty, I set empty values to be 0) and the other is a dense vector. Am I missing something or is this normal?Cordite

© 2022 - 2024 — McMap. All rights reserved.