How can you evaluate the implicit feedback collaborative filtering algorithm of Apache Spark, given that the implicit "ratings" can vary from zero to anything, so a simple MSE or RMSE does not have much meaning?
To answer this question, you'll need to go back to the original paper that defined what is implicit feedback and the ALS algorithm Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren and Chris Volinsky.
What is implicit feedback ?
In the absence of explicit ratings, recommender systems can infer user preferences from the more abundant implicit feedback , which indirectly reflect opinion through observing user behavior.
Implicit feedback can include purchase history, browsing history, search patterns, or even mouse movements.
Do same evaluating techniques apply here? Such as RMSE, MSE.
It is important to realize that we do not have a reliable feedback regarding which items are disliked. The absence of a click or purchase can be related to multiple reasons. We also can't track user reactions to our recommendations.
Thus, precision based metrics, such as RMSE and MSE, are not very appropriate, as they require knowing which items users dislike for it to make sense.
However, purchasing or clicking on an item is an indication of having an interest in it. I wouldn't say like because a click or a purchase might have different meaning depending on the context of the recommender.
So making recall-oriented measures applicable in this case. So under this scenario, several metrics have been introduced, the most important being the Mean Percentage Ranking (MPR), also known as Percentile Ranking.
Lower values of MPR are more desirable. The expected value of MPR for random predictions is 50%, and thus MPR > 50% indicates an algorithm no better than random.
Of course, it's not the only way to evaluate recommender systems with implicit ratings but it's the most common one used in practice.
For more information about this metric, I advise you to read the paper stated above.
Ok, now we know what we are going to use but what about Apache Spark?
Apache Spark still doesn't provide an out-of-the-box implementation for this metric but hopefully not for long. There is a PR waiting to be validated https://github.com/apache/spark/pull/16618 concerning adding RankingEvaluator
for spark-ml
.
The implementation nevertheless isn't complicated. You can refer to the code here if you are interested in getting it sooner.
I hope this answers your question.
One way of evaluating it is to split the data in a training set and a test set with a time cut. This way you train the model using your training set then run predictions and check the predictions against the test set.
Now for evaluation you can use Precision, Recall, F1... metrics.
© 2022 - 2024 — McMap. All rights reserved.
mllib
has some ranking metrics already as far as I remembered, right? And PR you've linked has been closed as stale. – Excisable