Spark issue with "Shutdown hook" being called before final status is reported (Closed)

I'm trying to run spark on a working hadoop cluster. When I run my python job with a small dataset size, everything seems to work fine. However when I use a larger dataset, the task fails and in the hadoop resource manager I get the diagnostic:

Shutdown hook called before final status was reported.

The command I use to run the job is:

spark-submit --master yarn --deploy-mode cluster --conf \
spark.yarn.appMasterEnv.SPARK_HOME=/dev/null --conf \
spark.executorEnv.SPARK_HOME=/dev/null  project-spark.py

It's just a test code that generates some data and runs Spark's KMeans algorithm on the generated data.

Any Ideas what I should be doing? Any help is greatly appreciated...

Also I am using Spark v2.0.0 on a Hadoop v2.6.0 cluster consisting of 4 workers and using Anaconda2 v4.1.1

____ Update

As @rakesh.rakshit suggested I ran the job with the parameters --master yarn-client and monitored the task. I found out that as @ShuaiYuan suggested I actually had a memory intensive part that wasn't done through Spark functions which was causing the problem.

Also, it seems as off Spark 1.4.0 it is not required to set SPARK_HOME variable since this issue was resolved.

Recommended topics

Hot tags