TaskSchedulerImpl: Initial job has not accepted any resources;
Asked Answered
F

5

11

Here is what I am trying to do.

I have created two nodes of DataStax enterprise cluster,on top of which I have created a java program to get the count of one table (Cassandra database table).

This program was built in eclipse which is actually from a windows box.

At the time of running this program from windows it's failing with the following error at runtime:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

The same code has been compiled & run successfully on those clusters without any issue. What could be the reason why am getting above error?

Code:

import org.apache.spark.SparkConf;

import org.apache.spark.SparkContext;

import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SchemaRDD;
import org.apache.spark.sql.cassandra.CassandraSQLContext;
import com.datastax.bdp.spark.DseSparkConfHelper;

public class SparkProject  {

    public static void main(String[] args) {

        SparkConf conf = DseSparkConfHelper.enrichSparkConf(new SparkConf()).setMaster("spark://10.63.24.14X:7077").setAppName("DatastaxTests").set("spark.cassandra.connection.host","10.63.24.14x").set("spark.executor.memory", "2048m").set("spark.driver.memory", "1024m").set("spark.local.ip","10.63.24.14X");

        JavaSparkContext sc = new JavaSparkContext(conf);

        CassandraSQLContext cassandraContext = new CassandraSQLContext(sc.sc());
        SchemaRDD employees = cassandraContext.sql("SELECT * FROM portware_ants.orders");

        //employees.registerTempTable("employees");
        //SchemaRDD managers = cassandraContext.sql("SELECT symbol FROM employees");
        System.out.println(employees.count());

        sc.stop();
    }
}
Fashionable answered 6/4, 2015 at 10:28 Comment(1)
B
23

I faced similar issue and after some online research and trial-n-error, I narrowed down to 3 causes for this (except for the first the other two are not even close to the error message):

  1. As indicated by the error, probably you are allocating the resources more than that is available. => This was not my issue
  2. Hostname & IP Address mishaps: I took care of this by specifying the SPARK_MASTER_IP and SPARK_LOCAL_IP in spark-env.sh
  3. Disable Firewall on the client : This was the solution that worked for me. Since I was working on a prototype in-house code, I disabled the firewall on the client node. For some reason the worker nodes, were not able to talk back to the client for me. For production purposes, you would want to open-up certain number of ports required.
Bierman answered 28/5, 2015 at 3:50 Comment(4)
In my case SPARK_LOCAL_IP helped. I connected through VPN, and had different interfaces configured. When I set SPARK_LOCAL_IP to the VPN interface, the error disappeared.Ogbomosho
In spark-env.sh of Master - I have SPARK_MASTER_IP set. Now what is SPARK_LOCAL_IP and wouldnt that change for Master and Worker instance. Firewall is anyways disabled on both instances. Still I am having issues with respect to the Submit PySpark application on AWS EC2 - getting Initial job failed error - Application goes in Wait state due to unavailability of resources. Let me know if there is any workaround. Problem is stated here - #38360301 @BiermanMetatarsus
@Ogbomosho SPARK_LOCAL_IP should be set to corresponding IP address for both worker and driver right?Metatarsus
My issue was that I was trying to run with a user with limited permissions, when I ran with root the worker nodes started correctly.Mcmann
A
7

My problem was that I was assigning too much memory than my slaves had available. Try reducing the memory size of the spark submit. Something like the following:

~/spark-1.5.0/bin/spark-submit --master spark://my-pc:7077 --total-executor-cores 2 --executor-memory 512m

with my ~/spark-1.5.0/conf/spark-env.sh being:

SPARK_WORKER_INSTANCES=4
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2
Arrhenius answered 20/9, 2015 at 5:37 Comment(1)
I have 1 worker instance, 2 cores and 6gb memory available and I have allocated 1 core and 1gb worth of memory to the application while submitting with no other application running at the moment. Despite that I am facing issue - with respect to the Submit PySpark application on AWS EC2 - getting Initial job failed error - Application goes in Wait state due to unavailability of resources. Let me know if there is any workaround. Problem is stated here - #38360301 @Sudipta BasakMetatarsus
S
3

Please look at Russ's post

Specifically this section:

This is by far the most common first error that a new Spark user will see when attempting to run a new application. Our new and excited Spark user will attempt to start the shell or run their own application and be met with the following message

...

The short term solution to this problem is to make sure you aren’t requesting more resources from your cluster than exist or to shut down any apps that are unnecessarily using resources. If you need to run multiple Spark apps simultaneously then you’ll need to adjust the amount of cores being used by each app.

Star answered 9/4, 2015 at 19:31 Comment(3)
This has been ensured. 2 cores and 1 spark application running on 6gb memory. Still I am having issues with respect to the Submit PySpark application on AWS EC2 - getting Initial job failed error - Application goes in Wait state due to unavailability of resources. Let me know if there is any workaround. Problem is stated here - #38360301 @StarMetatarsus
Hi @ChaitanyaBapat - did you find a solution to this issue?Tennant
Don't remember was 3 years ago. Must have resolved it using either restart / fixing some network related issue. Sorry about that.Metatarsus
G
0

In my case, the problem was that I had the following line in $SPARK_HOME/conf/spark-env.sh:

SPARK_EXECUTOR_MEMORY=3g

of each worker,
and the following line in $SPARK_HOME/conf/spark-default.sh

spark.executor.memory 4g

in the "master" node.

The problem went away once I changed 4g to 3g. I hope that this will help someone with the same issue. The other answers helped me spot this.

Glossary answered 9/3, 2017 at 15:25 Comment(0)
B
0

I have faced this issue few times even though the resource allocation was correct.

The fix was to restart the mesos services.

sudo service mesos-slave restart
sudo service mesos-master restart
Bluet answered 18/8, 2017 at 15:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.