takeSample() function in Spark
Asked Answered
F

3

8

I'm trying to use the takeSample() function in Spark and the parameters are - data, number of samples to be taken and the seed. But I don't want to use the seed. I want to have a different answer everytime. I'm not able to figure out how I can do that. I tried using System.nanoTime as the seed value but it gave an error since I think the data type didn't match. Is there any other function similar to takeSample() that can be used without the seed? Or is there any other implementation I can use with takeSample() so that I get a different output every time.

Flabellate answered 4/2, 2013 at 13:47 Comment(0)
E
8

System.nanoTime is of type long, the seed expected by takeSample is of type Int. Hence, takeSample(..., System.nanoTime.toInt) should work.

Educt answered 4/2, 2013 at 13:56 Comment(1)
In scala .toInt should be prefered over .intValueGoodrow
W
1

System.nanoTime returns Long, whereas takeSample expects an Int.
You can feed scala.util.Random.nextInt as a seed value to the takeSample function.

Wardroom answered 4/2, 2013 at 13:54 Comment(0)
R
1

As of Spark version 1.0.0, the seed parameter is optional. See https://issues.apache.org/jira/browse/SPARK-1438.

Rior answered 8/12, 2014 at 14:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.