Spark/Yarn: File does not exist on HDFS
Asked Answered
I

2

9

I have a Hadoop/Yarn cluster setup on AWS, I have one master and 3 slaves. I have verified I have 3 live nodes running on port 50070 and 8088. I tested a spark job in client deploy-mode, everything works fine.

When I try to spark-submit a job using ./spark-2.1.1-bin-hadoop2.7/bin/spark-submit --master yarn --deploy-mode cluster ip.py. I'm getting the following error.

Diagnostics: File does not exist: hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.zip

java.io.FileNotFoundException: File does not exist:
hdfs://ec2-54-153-50-11.us-west 1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.zip

17/05/28 18:58:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/28 18:58:33 INFO client.RMProxy: Connecting to ResourceManager at ec2-54-153-50-11.us-west-1.compute.amazonaws.com/172.31.5.235:8032
17/05/28 18:58:34 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
17/05/28 18:58:34 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
17/05/28 18:58:34 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
17/05/28 18:58:34 INFO yarn.Client: Setting up container launch context for our AM
17/05/28 18:58:34 INFO yarn.Client: Setting up the launch environment for our AM container
17/05/28 18:58:34 INFO yarn.Client: Preparing resources for our AM container
17/05/28 18:58:36 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/05/28 18:58:41 INFO yarn.Client: Uploading resource file:/tmp/spark-fbd6d435-9abe-4396-838e-60f19bc2dc14/__spark_libs__1200479165381142167.zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.zip
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/home/ubuntu/ip.py -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/ip.py
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/pyspark.zip
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/py4j-0.10.4-src.zip
17/05/28 18:58:45 INFO yarn.Client: Uploading resource file:/tmp/spark-fbd6d435-9abe-4396-838e-60f19bc2dc14/__spark_conf__7895841687984145748.zip -> hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_conf__.zip
17/05/28 18:58:46 INFO spark.SecurityManager: Changing view acls to: ubuntu
17/05/28 18:58:46 INFO spark.SecurityManager: Changing modify acls to: ubuntu
17/05/28 18:58:46 INFO spark.SecurityManager: Changing view acls groups to:
17/05/28 18:58:46 INFO spark.SecurityManager: Changing modify acls groups to:
17/05/28 18:58:46 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
17/05/28 18:58:46 INFO yarn.Client: Submitting application application_1495996836198_0003 to ResourceManager
17/05/28 18:58:46 INFO impl.YarnClientImpl: Submitted application application_1495996836198_0003
17/05/28 18:58:47 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:47 INFO yarn.Client:
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1495997926073
     final status: UNDEFINED
     tracking URL: http://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:8088/proxy/application_1495996836198_0003/
     user: ubuntu
17/05/28 18:58:48 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:49 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:50 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:51 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:52 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:53 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:54 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:55 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:56 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:57 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:58 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:58:59 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:59:00 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:59:01 INFO yarn.Client: Application report for application_1495996836198_0003 (state: ACCEPTED)
17/05/28 18:59:02 INFO yarn.Client: Application report for application_1495996836198_0003 (state: FAILED)
17/05/28 18:59:02 INFO yarn.Client:
     client token: N/A
     diagnostics: Application application_1495996836198_0003 failed 2 times due to AM Container for appattempt_1495996836198_0003_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:8088/cluster/app/application_1495996836198_0003Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.zip
java.io.FileNotFoundException: File does not exist: hdfs://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:9000/user/ubuntu/.sparkStaging/application_1495996836198_0003/__spark_libs__1200479165381142167.zip
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
    at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:421)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1495997926073
     final status: FAILED
     tracking URL: http://ec2-54-153-50-11.us-west-1.compute.amazonaws.com:8088/cluster/app/application_1495996836198_0003
     user: ubuntu
Exception in thread "main" org.apache.spark.SparkException: Application application_1495996836198_0003 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1180)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1226)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/05/28 18:59:02 INFO util.ShutdownHookManager: Shutdown hook called
17/05/28 18:59:02 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fbd6d435-9abe-4396-838e-60f19bc2dc14
ubuntu@ip-172-31-5-235:~$
Implement answered 28/5, 2017 at 19:36 Comment(3)
Have you run this ip.py in client deploy mode? How about standalone? Could you share some version or gist of the code? Have other Python/Scala/java spark apps run successfully against this cluster in the same manner? I presume your shell is the master, please correct if I'm wrong there. Also, check permissions on HDFS to be sure you have perms to write to that directoryWychelm
In client deploy mode, everything runs fine.Implement
Have you tried to test connectivity to HDFS port 9000 from each of your slaves?Bluetongue
I
11

I was setting the master to local (.setMaster('local')) in my source file. After I remove this, everything works fine.

Implement answered 29/5, 2017 at 0:43 Comment(1)
Thank you so much! I was banging my head for hours. It could've just said this in error.Pulpboard
E
0

I also had this problem. I tried the solution of removing the setMaster('local') in the source, however the error did not go away.

What finally solved my problem was actually quite simple: the sparkcontext needed to be initialized First (even before non-spark related variables).

Solution

From the post mentioned above, here is a python example. The same logic worked for me in scala

Hi there, If i follow your suggestions, it works.

Our code was like that :

Import numpy as np 
Import SparkContext 
foo = np.genfromtext(xxxxx) 
sc=SparkContext(...)
#compute

===> It fails ...

Import numpy as np 
Import SparkContext 
sc=SparkContext(...) 
foo = np.genfromtext(xxxxx)
#compute

===> It works perfectly ...

Note I also removed the setMaster('local'), because it makes sense that would interfere too.

Eryneryngo answered 14/12, 2017 at 0:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.