According to mllib.feature.Word2Vec - spark 1.3.1 documentation [1]:
def setNumIterations(numIterations: Int): Word2Vec.this.type
Sets number of iterations (default: 1), which should be smaller than or equal to number of partitions.
def setNumPartitions(numPartitions: Int): Word2Vec.this.type
Sets number of partitions (default: 1). Use a small number for accuracy.
But in this Pull Request [2]:
To make our implementation more scalable, we train each partition separately and merge the model of each partition after each iteration. To make the model more accurate, multiple iterations may be needed.
Questions:
How do the parameters numIterations & numPartitions effect the internal working of the algorithm?
Is there a trade-off between setting the number of partitions and number of iterations considering the following rules ?
more accuracy -> more iteration a/c to [2]
more iteration -> more partition a/c to [1]
more partition -> less accuracy