How to do an item based recommendation in spark mllib?
Asked Answered
G

2

8

In Mahout, there is support for item based recommendation using API method:

ItemBasedRecommender.mostSimilarItems(int productid, int maxResults, Rescorer rescorer)

But in Spark Mllib, it appears that the APIs within ALS can fetch recommended products but userid must be provided via:

MatrixFactorizationModel.recommendProducts(int user, int num)

Is there a way to get recommended products based on a similar product without having to provide user id information, similar to how mahout performs item based recommendation.

Gisarme answered 17/12, 2014 at 18:20 Comment(0)
F
11

Spark 1.2x versions do not provide with a "item-similarity based recommender" like the ones present in Mahout.

However, MLlib currently supports model-based collaborative filtering, where users and products are described by a small set of latent factors {Understand the use case for implicit (views, clicks) and explicit feedback (ratings) while constructing a user-item matrix.}

MLlib uses the alternating least squares (ALS) algorithm [can be considered similar to the SVD algorithm] to learn these latent factors.

If you need to construct purely an item-similarity based recommender, I would recommend this:

  1. Represent all items by a feature vector
  2. Construct an item-item similarity matrix by computing a similarity metric (such as cosine) with each items pair
  3. Use this item similarity matrix to find similar items for users

Since similarity matrices do not scale well, (imagine how your similarity matrix would grow if you had 100 items vs 10000 items) this read on DIMSUM might be helpful if you're planning to implement it on a large number of items:

https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html

Fossette answered 6/4, 2015 at 7:59 Comment(0)
B
4

Please see my implementation of item-item recommendation model using Apache Spark here. You can implement this by using the productFeatures matrix that is generated when you run the MLib ALS algorithm on user-product-ratings data. The ALS algorithm essentially factorizes two matrix - one is userFeatures and the other is productFeatures matrix. You can run a cosine similarity on the productFeatures rank matrix to find item-item similarity.

Bordereau answered 5/12, 2016 at 18:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.