AWS EMR and Spark 1.0.0
Asked Answered
C

1

8

I've been running into some issues recently while trying to use Spark on an AWS EMR cluster.

I am creating the cluster using something like :

./elastic-mapreduce --create --alive \
--name "ll_Spark_Cluster" \
--bootstrap-action s3://elasticmapreduce/samples/spark/1.0.0/install-spark-shark.rb \
--bootstrap-name "Spark/Shark" \
--instance-type m1.xlarge \
--instance-count 2 \
--ami-version 3.0.4

The issue is that whenever I try to get data from S3 I get an exception. So if I start the spark-shell and try something like :

val data = sc.textFile("s3n://your_s3_data")

I get the following exception :

WARN storage.BlockManager: Putting block broadcast_1 failed
java.lang.NoSuchMethodError:
com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
Caryloncaryn answered 21/8, 2014 at 7:43 Comment(2)
Did they just release an install script without checking that it works?Colby
In their official doc they are using the Spark 0.8.1 script ( aws.amazon.com/articles/Elastic-MapReduce/4926593393724923 ) but they do have this Spark 1.0.0 script, not sure if it's still in the testing phase. I think they should just be a bit more explicit about which version they support.Caryloncaryn
C
9

This issue was caused by the guava library,

The version that's on the AMI is 11 while spark needs version 14.

I edited the bootstrap script from AWS to install spark 1.0.2 and update the guava library during the bootstrap action you can get the gist here :

https://gist.github.com/tnbredillet/867111b8e1e600fa588e

Even after updating guava I still had an issue. When I tried to save data on S3 I had an exception thrown

lzo.GPLNativeCodeLoader - Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path

I solved that by adding the hadoop native library to the java.library.path. When I run a job I add the option

 -Djava.library.path=/home/hadoop/lib/native 

or if I run a job through spark-submit I add the

--driver-library-path /home/hadoop/lib/native 

argument.

Caryloncaryn answered 21/8, 2014 at 7:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.