For a Recommender System, I need to compute the cosine similarity between all the columns of a whole Spark DataFrame.
In Pandas I used to do this:
import sklearn.metrics as metrics
import pandas as pd
df= pd.DataFrame(...some dataframe over here :D ...)
metrics.pairwise.cosine_similarity(df.T,df.T)
That generates the Similarity Matrix between the columns (since I used the transposition)
Is there any way to do the same thing in Spark (Python)?
(I need to apply this to a matrix made of tens of millions of rows, and thousands of columns, so that's why I need to do it in Spark)