I've been running into some issues recently while trying to use Spark on an AWS EMR cluster.
I am creating the cluster using something like :
./elastic-mapreduce --create --alive \
--name "ll_Spark_Cluster" \
--bootstrap-action s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \
--bootstrap-name "Spark/Shark" \
--instance-type m1.xlarge \
--instance-count 2 \
--ami-version 3.0.4
The issue is that whenever I try to get data from S3 I get an exception. So if I start the spark-shell and try something like :
val data = sc.textFile("s3n://your_s3_data")
I get the following exception :
WARN storage.BlockManager: Putting block broadcast_1 failed
java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;