Apache Spark: setting executor instances does not change the executors
Asked Answered
U

4

22

I have an Apache Spark application running on a YARN cluster (spark has 3 nodes on this cluster) on cluster mode.

When the application is running the Spark-UI shows that 2 executors (each running on a different node) and the driver are running on the third node. I want the application to use more executors so I tried adding the argument --num-executors to Spark-submit and set it to 6.

spark-submit --driver-memory 3G --num-executors 6 --class main.Application --executor-memory 11G --master yarn-cluster myJar.jar <arg1> <arg2> <arg3> ...

However, the number of executors remains 2.

On spark UI I can see that the parameter spark.executor.instances is 6, just as I intended, and somehow there are still only 2 executors.

I even tried setting this parameter from the code

sparkConf.set("spark.executor.instances", "6")

Again, I can see that the parameter was set to 6, but still there are only 2 executors.

Does anyone know why I couldn't increase the number of my executors?

yarn.nodemanager.resource.memory-mb is 12g in yarn-site.xml

Unshackle answered 29/4, 2015 at 10:14 Comment(2)
what is value of yarn.nodemanager.resource.memory-mb in yarn-site.xml??Chiton
Yarn.nodemanager.resource.memory-mb is 12GiBUnshackle
C
24

Increase yarn.nodemanager.resource.memory-mb in yarn-site.xml

With 12g per node you can only launch driver(3g) and 2 executors(11g).

Node1 - driver 3g (+7% overhead)

Node2 - executor1 11g (+7% overhead)

Node3 - executor2 11g (+7% overhead)

now you are requesting for executor3 of 11g and no node has 11g memory available.

for 7% overhead refer spark.yarn.executor.memoryOverhead and spark.yarn.driver.memoryOverhead in https://spark.apache.org/docs/1.2.0/running-on-yarn.html

Chiton answered 29/4, 2015 at 11:4 Comment(5)
I couldn't increase this parameter because I don't have enough RAM, but I tried it on a different spark application that does not need that much executor memory and it worked, thanksUnshackle
I have been agonizing over a similar issue for a few days. Thank you for your answer, worked like a charm!Panatella
Is the 7% fixed? Any way we can adjust 7% overhead? I could see the settings but its not telling the % value instead 384?Culinarian
@JoyGeorgeKunjikkuru yes you can change it, spark 2.3 onwards these properties are known as spark.executor.memoryOverhead spark.driver.memoryOverhead and their default value is 10%, you can set absolute value to change them. Refer:spark.apache.org/docs/latest/configuration.htmlChiton
@banjara, I have not set the executor-memory(its default:1GB), passing num-executors is not honored with spark-submit. I've got 70GB memory in the yarnNippur
A
17

Note that yarn.nodemanager.resource.memory-mb is total memory that a single NodeManager can allocate across all containers on one node.

In your case, since yarn.nodemanager.resource.memory-mb = 12G, if you add up the memory allocated to all YARN containers on any single node, it cannot exceed 12G.

You have requested 11G (-executor-memory 11G) for each Spark Executor container. Though 11G is less than 12G, this still won't work. Why ?

  • Because you have to account for spark.yarn.executor.memoryOverhead, which is min(executorMemory * 0.10, 384) (by default, unless you override it).

So, following math must hold true:

spark.executor.memory + spark.yarn.executor.memoryOverhead <= yarn.nodemanager.resource.memory-mb

See: https://spark.apache.org/docs/latest/running-on-yarn.html for latest documentation on spark.yarn.executor.memoryOverhead

Moreover, spark.executor.instances is merely a request. Spark ApplicationMaster for your application will make a request to YARN ResourceManager for number of containers = spark.executor.instances. Request will be granted by ResourceManager on NodeManager node based on:

  • Resource availability on the node. YARN scheduling has its own nuances - this is a good primer on how YARN FairScheduler works.
  • Whether yarn.nodemanager.resource.memory-mb threshold has not been exceeded on the node:
    • (number of spark containers running on the node * (spark.executor.memory + spark.yarn.executor.memoryOverhead)) <= yarn.nodemanager.resource.memory-mb*

If the request is not granted, request will be queued and granted when above conditions are met.

Ankylosaur answered 21/11, 2017 at 16:42 Comment(0)
A
4

To utilize the spark cluster to its full capacity you need to set values for --num-executors, --executor-cores and --executor-memory as per your cluster:

  • --num-executors command-line flag or spark.executor.instances configuration property controls the number of executors requested ;
  • --executor-cores command-line flag or spark.executor.cores configuration property controls the number of concurrent tasks an executor can run ;
  • --executor-memory command-line flag or spark.executor.memory configuration property controls the heap size.
Anthill answered 8/9, 2017 at 10:2 Comment(0)
H
2

You only have 3 nodes in the cluster, and one will be used as the driver, you have only 2 nodes left, how can you create 6 executors?

I think you confused --num-executors with --executor-cores.

To increase concurrency, you need more cores, you want to utilize all the CPUs in your cluster.

Hawes answered 29/4, 2015 at 11:19 Comment(2)
So there can be only one executor per node?Unshackle
no, you have 12g memory available per node and one executor is taking 11g, so you can't launch second executor on same node. you can increase yarn.nodemanager.resource.memory-mb or reduce --executor-memory to launch multiple executors on same nodeChiton

© 2022 - 2024 — McMap. All rights reserved.