I have been working with Mahout in the past few days trying to create a recommendation engine. The project I'm working on has the following data:
I am now experimenting with 1/3 of the full set we have (i.e. 6M out of 18M recommendations). At any configuration I tried, Mahout was providing quite disappointing results. Some recommendations took 1.5 seconds while other took over a minute. I think a reasonable time for a recommendation should be around the 100ms timeframe.
Why does Mahout work so slow?
I'm running the application on a Tomcat with the following JVM arguments (even though adding them didn't make much of a difference):
-Xms4096M -Xmx4096M -da -dsa -XX:NewRatio=9 -XX:+UseParallelGC -XX:+UseParallelOldGC
Below are code snippets for my experiments:
User similarity 1:
DataModel model = new FileDataModel(new File(dataFile));
UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(model), model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, Double.NEGATIVE_INFINITY, similarity, model, 0.5);
recommender = new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
User similarity 2:
DataModel model = new FileDataModel(new File(dataFile));
UserSimilarity similarity = new CachingUserSimilarity(new LogLikelihoodSimilarity(model), model);
UserNeighborhood neighborhood = new CachingUserNeighborhood(new NearestNUserNeighborhood(10, similarity, model), model);
recommender = new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
Item similarity 1:
DataModel dataModel = new FileDataModel(new File(dataFile));
ItemSimilarity itemSimilarity = new LogLikelihoodSimilarity(dataModel);
recommender = new GenericItemBasedRecommender(dataModel, itemSimilarity);