Using matrix factorization for a recommender system
Asked Answered
B

1

13

I'm working on a recommender system for restaurants using an item-based collaborative filter in C# 6.0. I want to set up my algorithm to perform as well as possible, so I've done some research on different ways to predict ratings for restaurants the user hasn't reviewed yet.

I'll start with the research I have done

First I wanted to set up a user-based collaborative filter using a pearson correlation between users to be able to see which users fit well together.
The main problem with this was the amount of data required to be able to calculate this correlation. First you needed 4 reviews per 2 users on the same restaurant. But my data is going to be very sparse. It wasn't likely that 2 users would have reviewed the exact same 4 restaurants. I wanted to fix this by widening the match terms (I.e. not matching users on same restaurants, but on a same type of restaurant), but this gave me the problem where it was hard to determine which reviews I would use in the correlation, since a user could have left 3 reviews on a restaurant with the type 'Fast food'. Which of these would fit best with the other user's review on a fast food restaurant?

After more research I concluded that an item-based collaborative filter outperforms an user-based collaborative filter. But again, I encountered the data sparsity issue. In my tests I was successfully able to calculate a prediction for a rating on a restaurant the user hasn't reviewed yet, but when I used the algorithm on a sparse dataset, the results weren't good enough. (Most of the time, a similarity wasn't possible between two restaurants, since no 2 users have rated the same restaurant).
After even more research I found that using a matrix factorization method works well on sparse data.

Now my problem

I have been looking all over the internet for tutorials on using this method, but I don't have any experience in recommender systems and my knowledge on algebra is also limited. I understand the just of the method. You have a matrix where you have 1 side the users and the other side the restaurants. Each cell is the rating the user has given on the restaurant.
The matrix factorization method creates two matrices of this, one with the weight between users and the type of the restaurant, and the other between restaurants and these types.

The thing I just can't figure out is how to calculate the weight between the type of restaurant and the restaurants/users (If I understand matrix factorization correctly). I found dozens of formulas which calculates these numbers, but I can't figure out how to break them down and apply them in my application.

I'll give you an example on how data looks in my application:
In this table U1 stands for a user and R1 stands for a restaurant. Each restaurant has their own tags (Type of restaurant). I.e. R1 has the tag 'Italian', R2 has 'Fast food', etc.

   |  R1  |  R2  |  R3  |  R4  |
U1 |  3   |  1   |  2   |  -   |
U2 |  -   |  3   |  2   |  2   |
U3 |  5   |  4   |  -   |  4   |
U4 |  -   |  -   |  5   |  -   |

Is there anyone who can point me in the right direction or explain how I should use this method on my data? Any help would be greatly appreciated!

Binion answered 3/11, 2016 at 10:3 Comment(3)
Somehow the approach sounds suspicious to me... "not matching users on same restaurants, but on a same type of restaurant" sounds as if the user is rating his preferred kind of food. But thats probably not the case, the user is rating how good a restaurant is at delivering the expected food and if you substitute a decent burger restaurant with McD the results of this substitution might not outperform a pretty Random.Next call (I think you want to predict better than that?).Joyajoyan
Correct, this is why I decided to move to an item-based collaborative filter or possible a matrix factorization when I figure out how to implement itBinion
General idea is to substitute missing rating per restaurant with rating per restaurant type. First matrix averages ratings of restaurants, second matrix highlights differences inside same type restaurants. For example, if restaurants A and B are both Italian and the only existing user rates them 1 and 3, first matrix can contain 2 and second 0,5 and 1,5 (here we use multiplication of cells). If there are 2 users, second matrix somehow averages restaurant<->type value so that overall average stays the same.Solid
S
3

Matrix factorization assumes that the "latent factors" such as the preference for italian food of a user and the italieness of the item food is implicated by the ratings in the matrix.

So the whole Problem kind of transforms into a matrix reconstruction problem for which a lot of different solutions exist. A simple, maybe slow, solution is (besides ALS and some other possibilities of Matrix reconstruction) approximating the matrix using a gradient descend algorithm. I recommend this short article ieee article about recommender systems.

Extracting the latent factors is a different problem.

So an implementation of GDM could look like:

public void learnGDM(){
//traverse learnSet
for(int repeat = 0; repeat < this.steps; repeat++){
    for (int i = 0; i < this.learnSet.length; i++){
        for (int j = 0; j < this.learnSet[0].length; j++){
            if(this.learnSet[i][j] > 0.0d){
                double Rij = this.learnSet[i][j];

                for(int f = 0 ; f <= latentFactors; f++){
                    double error = Rij - dotProduct(Q.getRow(i), P.getRow(j));/*estimated_Rij;*/
                    //ieee computer 1.pdf
                    double qif = Q.get(i, f);
                    double pif = P.get(j, f);
                    double Qvalue = qif + gradientGamma * (error * pif - gradientLambda * qif);
                    double Pvalue = pif + gradientGamma * (error * qif - gradientLambda * pif);
                    Q.set(i,f, Qvalue);
                    P.set(j, f, Pvalue);
                }
            }
        }
    }
    //check global error
    if(checkGlobalError() < 0.001d){
        System.out.println("took" + repeat + "steps");
        break;
    }
}

Where the learnset is a two dimensional Array containing the Ratingmatrix as in the ieee article. The GDM algorithm changes the rating vectors P and Q a bit every iteration so that they approximate the ratings in the ratingmatrix. Then the "not given" ratings can be calculated by the dot product of P and Q. The highest estimations for the not given ratings will then be the recommendations.

So thats it for the start. There are a lot of optimizations and other algorithms or modified versions of GDM that can also be run in parallel.

Some good reads:

recommender systems in the Encyclopedia of Machine Learning

presentation on matrix factorization

recommender systems <--- big one ^^

Subedit answered 15/11, 2016 at 21:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.