Why does spark-ml ALS model returns NaN and negative numbers predictions?
Asked Answered
J

1

6

Actually I'm trying to use ALS from spark-ml with implicit ratings.

I noticed that some predictions given by my trained model are negative or NaN, why is it?

negative value

Jerz answered 4/7, 2017 at 17:19 Comment(9)
experienced similar issue. possible reason seen in the answer of this question, #37380251Hep
Can you show us how you created your model ?Validate
@Validate databricks-prod-cloudfront.cloud.databricks.com/public/…Jerz
I'll take a look at it. You are using Spark 2+ ?Validate
@Validate Yes, 2.1Jerz
Ok set "nonnegative=True" for ALS ! That should remove negative values.Validate
I see but is it natural in the algorithm? And by the way I used the whole dataset don't know why it's returning NaNJerz
Well you are just setting the nonnegative constraint to compute least squaresValidate
Let us continue this discussion in chat.Validate
V
7

Apache Spark provides an option to force non negative constraints on ALS.

Thus, to remove these negative values, you'll just need to set :

Python:

nonnegative=True

Scala:

setNonnegative(true)

when creating your ALS model, i.e :

>>> als = ALS(rank=10, maxIter=5, seed=0, nonnegative=True)

Non-negative matrix factorization (NMF or NNMF), also called non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have nonnegative elements [Ref. Wikipedia].

If you want to read more about NMF , I'd recommend reading the following paper :

As for NaN values, usually it's due to splitting your dataset which can lead to unseen items or users if one of them isn't present in the training set and for the matter just present in the testing set. This might also happen if you cross validated your training. For the matter, there is a couple of JIRAs that are marked resolved for 2.2 :

The latest will allow you set the cold start strategy to use when creating your model.

Validate answered 5/7, 2017 at 13:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.