What is the relationship between maxExecutors, num-executors and initialExecutors when using Spark on YARN?
Asked Answered
B

1

7

At first, I read this article, which says spark.dynamicAllocation.maxExecutors will have value equal to num-executors if spark.dynamicAllocation.maxExecutors is not explicitly set. However, from the following part of this article, it says "--num-executors or spark.executor.instances acts as a minimum number of executors with a default value of 2", And it confuses me.

  1. My first question is what is the usage of --num-executors in Spark 2.x or later versions? Does it act like a obsolete option which is useful before dynamicAllocation? When dynamicAllocation is introduced, --num-executors and --max-executors act more like some default values to spark.dynamicAllocation.*?

  2. What is the difference between --conf spark.dynamicAllocation.maxExecutors and --max-executor? Does the later acts like an alias to the former?

Meanwhile, the article does not mention the relationship between num-executors and spark.dynamicAllocation.initialExecutors. So I make a experiment in which I send the following arguments:

--conf spark.dynamicAllocation.minExecutors=2  --conf spark.dynamicAllocation.maxExecutors=20  --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=3 --conf spark.dynamicAllocation.maxExecutors=10  --num-executors 0  --driver-memory 1g  --executor-memory 1g  --executor-cores 2  

And it turns out that initially 3 executor is allocated(correspond to initialExecutors) and then reduced to 2(correspond to minExecutors), and it seems that --num-executors is useless here. However, the article says "-num-executors or spark.executor.instances acts as a minimum number of executors", so there is a contradiction now.

  1. My third question is what is the relationship between num-executors and spark.dynamicAllocation.initialExecutors, is spark.dynamicAllocation.initialExecutors prior to num-executors? From the document, I find when num-executors is set larger than spark.dynamicAllocation.initialExecutors, it will override spark.dynamicAllocation.initialExecutors.
Bravissimo answered 7/7, 2020 at 5:28 Comment(1)
Did you find answer to this question, looking out for the same.Massasoit
D
1

Your first article is from Qubole's developer guide, it is not necessarily reflective of Apache Spark's default behavior.

To answer your questions:

  1. num-executors is not necessarily made obsolete, if you have dynamic allocation set using a separate process or command, num-executors acts as a safeguard for proper allocation if dynamic allocation is ever turned off for whatever reason. For that reason, I usually have num-executors set equal to spark.dynamicAllocation.maxExecutors.
  2. I'm unaware of max-executors, but I would assume that it is an alias with the lesser of that and spark.dynamicAllocation.maxExecutors being the set maximum if both are specified, going by spark's typical logic. Best to just use spark.dynamicAllocation.maxExecutors.
  3. When using dynamic allocation, num-executors behaves as spark.dynamicAllocation.initialExecutors not as spark.dynamicAllocation.minExecutors. The initial executors is set to the max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances (from the source code).
Dentifrice answered 6/7, 2023 at 18:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.