I am trying to implement LSH spark to find nearest neighbours for each user on very large datasets containing 50000 rows and ~5000 features for each row. Here is the code related to this.
MinHashLSH mh = new MinHashLSH().setNumHashTables(3).setInputCol("features")
.setOutputCol("hashes");
MinHashLSHModel model = mh.fit(dataset);
Dataset<Row> approxSimilarityJoin = model .approxSimilarityJoin(dataset, dataset, config.getJaccardLimit(), "JaccardDistance");
approxSimilarityJoin.show();
The job gets stuck at approxSimilarityJoin() function and never goes beyond it. Please let me know how to solve it.