How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

Asked 28/9, 2017 at 6:36 Answered 29/9, 2017 at 13:55

How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?

Clinometer answered 28/9, 2017 at 6:36 Comment(0)

To answer this question, you'll need to go back to the original paper that defined what is implicit feedback and the ALS algorithm Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.

What is implicit feedback ?

In the absence of explicit ratings, recommender systems can infer user preferences from the more abundant implicit feedback , which indirectly reflect opinion through observing user behavior.

Implicit feedback can include purchase history, browsing history, search patterns, or even mouse movements.

Do same evaluating techniques apply here? Such as RMSE, MSE.

It is important to realize that we do not have a reliable feedback regarding which items are disliked. The absence of a click or purchase can be related to multiple reasons. We also can't track user reactions to our recommendations.

Thus, precision based metrics, such as RMSE and MSE, are not very appropriate, as they require knowing which items users dislike for it to make sense.

However, purchasing or clicking on an item is an indication of having an interest in it. I wouldn't say like because a click or a purchase might have different meaning depending on the context of the recommender.

So making recall-oriented measures applicable in this case. So under this scenario, several metrics have been introduced, the most important being the Mean Percentage Ranking (MPR), also known as Percentile Ranking.

Lower values of MPR are more desirable. The expected value of MPR for random predictions is 50%, and thus MPR > 50% indicates an algorithm no better than random.

Of course, it's not the only way to evaluate recommender systems with implicit ratings but it's the most common one used in practice.

For more information about this metric, I advise you to read the paper stated above.

Ok, now we know what we are going to use but what about Apache Spark?

Apache Spark still doesn't provide an out-of-the-box implementation for this metric but hopefully not for long. There is a PR waiting to be validated https://github.com/apache/spark/pull/16618 concerning adding RankingEvaluator for spark-ml.

The implementation nevertheless isn't complicated. You can refer to the code here if you are interested in getting it sooner.

I hope this answers your question.

Coextensive answered 29/9, 2017 at 13:55 Comment(5)

mllib has some ranking metrics already as far as I remembered, right? And PR you've linked has been closed as stale. – Excisable 1/10, 2017 at 12:35

It does. But it doesn’t include MPR nor MRR. And since mllib is in maintenance mode. The PR concerning adding them was rejected. – Coextensive 1/10, 2017 at 12:37

I'm not sure I understand your question @Aldysyahdeini – Coextensive 28/8, 2018 at 9:52

Hi eliasah, how can we safely assume if the value of >50% define the algorithm is no better than random. correct me if I am wrong, but it really depends on the type of distribution of my recommendation list right? I understand it is an expected value, but I am still not sure about it – Armenta 28/8, 2018 at 10:7

On one hand, @Aldysyahdeini considering a normal distribution yes ofc that the rules that applies. For other types of distribution, this limit should be reviewed. On the other hand, when one usually create random predictions, the process follows a gaussian process. Are you interested in skewed random predictions for some reason ? Otherwise, I hope I cleared that point for you. – Coextensive 29/8, 2018 at 9:5

-2

One way of evaluating it is to split the data in a training set and a test set with a time cut. This way you train the model using your training set then run predictions and check the predictions against the test set.

Now for evaluation you can use Precision, Recall, F1... metrics.

Stubble answered 29/9, 2017 at 13:54 Comment(4)

This is a very generic approach and it doesn't actually answer the problem of implicit ratings. – Coextensive 29/9, 2017 at 14:1

@Coextensive not very different from your answer, you just elaborated it more. – Stubble 29/9, 2017 at 14:32

The main idea behind my answer is that precision based metrics aren’t applicable. Yet the first evaluation metric that you have suggested is precision. I don’t want to sound rude, but how is that not different ? :-) – Coextensive 29/9, 2017 at 14:34

I said Precision/Recall, not precision MSE, RMSE, so totally valid. – Stubble 29/9, 2017 at 14:43

Recommended topics

Hot tags