API call made to submit the Job. Response states - It is Running
On Cluster UI -
Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive
Core 1 out of 2 used
Memory 1Gb out of 6 used
Running Application
app-20160713130056-0020 - Waiting since 5hrs
Cores - unlimited
Job Description of the Application
Active Stage
reduceByKey at /root/wordcount.py:23
Pending Stage
takeOrdered at /root/wordcount.py:26
Running Driver -
stderr log page for driver-20160713130051-0025
WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
According to Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Slaves haven't been started - Hence it doesn't have resources.
However in my case - Slave 1 is working
According to Unable to Execute More than a spark Job "Initial job has not accepted any resources" I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere
Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI
According to TaskSchedulerImpl: Initial job has not accepted any resources; I assigned
~/spark-1.5.0/conf/spark-env.sh
Spark Environment Variables
SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2
Replicated those across the Slaves
sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh
All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.
Edited July 18,2016
Wordcount.py - My PySpark application code -
from pyspark import SparkContext, SparkConf
logFile = "/user/root/In/a.txt"
conf = (SparkConf().set("num-executors", "1"))
sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
print("in here")
lines = sc.textFile(logFile)
print("text read")
c = lines.count()
print("lines counted")
Error
Starting job: count at /root/wordcount.py:11
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Got job 0 (count at /root/wordcount.py:11) with 2 output partitions
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11), which has no missing parents
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 56.2 KB)
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 59.7 KB)
16/07/18 07:46:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.17.189:43684 (size: 3.4 KB, free: 511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
According to Spark UI showing 0 cores even when setting cores in App,
Spark WebUI states zero cores used and indefinite wait no tasks running. The application is also using NO MEMORY whatsoever during run time or cores and immediately hits a status of waiting when starting
Spark version 1.6.1 Ubuntu Amazon EC2
from pyspark import SparkContext, SparkConf
logFile = "/user/root/In/a.txt"
conf = (SparkConf().set("num-executors", "1"))
sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
textFile = sc.textFile(logFile)
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
wordCounts.saveAsTextFile("/user/root/In/output.txt")
– Rountreecat /root/spark-ec2/cluster-url
– RountreeSPARK_WORKER_INSTANCES
property is deprecated and will have no effect on your configuration. It looks like you are assigning 1gb of memory and 2 cores per executor, correct ? How much total memory do you have on your worker ? – Reid16/07/18 11:39:50 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
was last point of similarity. Later running code -16/07/18 11:39:52 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-172-31-17-189.ec2.internal:53938) with ID 0
Whereas non-running code gave error -TaskSchedulerImpl: Initial job has not accepted any resources; check ... and have sufficient resources
– Rountree