Can num-executors override dynamic allocation in spark-submit
Asked Answered
S

3

7

Can specifying num-executors in spark-submit command override alreay enabled dynamic allocation (spark.dynamicAllocation.enable true) ?

Slype answered 20/1, 2018 at 5:9 Comment(0)
D
6

You can see from log:

INFO util.Utils: Using initial executors = 60, 
max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances

That means spark will take the max(spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors, spark.executor.instances)

spark.executor.instances is --num-executor.

Dexter answered 25/2, 2019 at 20:29 Comment(0)
F
4

In your spark-defaults.conf file you can set the following to control the behaviour of dynamic allocation on Spark2

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.initialExecutors=1
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=5

If your spark2-submit command does not specify anything then your job starts with 1 executor and increases to 5 if required.

If your spark2-submit command specifies the following

--num-executors=3

then your job will start with 3 executors and still grow to 5 executors if required.

Check your log messages for

Using initial executors = [initialExecutors], max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances

Additionally if do not specify spark.dynamicAllocation.maxExecutors at all then, given a resource hungry job, it will continue to use as many executors as it can (in the case of Yarn this could be restricted by a limit defined on the Queue you submitted your job to). I have seen "rogue" spark jobs on Yarn hog huge amounts of resource on shared clusters starving other jobs. Your Yarn administrators should prevent resource starvation, etc by configuring sensible defaults and splitting different types of work loads across different queues.

I would advise performance testing any changes you intend to make in overriding the defaults, particularly trying to simulate busy periods of your system.

Farron answered 27/2, 2019 at 14:5 Comment(0)
S
0

To explicitly control the number of executors, you can override dynamic allocation by setting the "--num-executors" command-line or spark.executor.instances configuration property.

"--num-executor" property in spark-submit is incompatible with spark.dynamicAllocation.enabled. If both spark.dynamicAllocation.enabled and spark.executor.instances are specified, dynamic allocation is turned off and the limited number of spark.executor.instances is used".

Also, it will give the warning WARN SparkContext: Dynamic Allocation and num executors are both set, thus dynamic allocation is disabled.

Slype answered 20/1, 2018 at 10:5 Comment(2)
The above answer is valid for Spark 1.6. I think in Spark 2.2 the scenario is different, here you cannot override the dynamic Allocation by using --num-executors. As per the spark latest documentation spark.apache.org/docs/latest/… "If --num-executors (or spark.executor.instances) is set and larger than this value, it will be used as the initial number of executors."Slype
Please let me know your comments on this question.Slype

© 2022 - 2024 — McMap. All rights reserved.