Spark: How to set spark.yarn.executor.memoryOverhead property in spark-submit
Asked Answered
F

2

5

In Spark 2.0. How do you set the spark.yarn.executor.memoryOverhead when you run spark submit.

I know for things like spark.executor.cores you can set --executor-cores 2. Is it the same pattern for this property? e.g. --yarn-executor-memoryOverhead 4096

Fairly answered 1/8, 2018 at 13:45 Comment(1)
probably spark-submit ... --conf spark.yarn.executor.memoryOverhead 4096 ...Ethnarch
R
18

Please find example. The values can also be given in Sparkconf.

Example:

./bin/spark-submit \
--[your class] \
--master yarn \
--deploy-mode cluster \
--num-exectors 17
--conf spark.yarn.executor.memoryOverhead=4096 \
--executor-memory 35G \  //Amount of memory to use per executor process 
--conf spark.yarn.driver.memoryOverhead=4096 \
--driver-memory 35G \   //Amount of memory to be used for the driver process
--executor-cores 5
--driver-cores 5 \     //number of cores to use for the driver process 
--conf spark.default.parallelism=170
 /path/to/examples.jar
Recuperative answered 1/8, 2018 at 14:20 Comment(0)
I
7

spark.yarn.executor.memoryOverhead has now been deprecated:

WARN spark.SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.


You can programmatically set spark.executor.memoryOverhead by passing it as a config:

spark = (
    SparkSession.builder
        .master('yarn')
        .appName('StackOverflow')
        .config('spark.driver.memory', '35g')
        .config('spark.executor.cores', 5)
        .config('spark.executor.memory', '35g')
        .config('spark.dynamicAllocation.enabled', True)
        .config('spark.dynamicAllocation.maxExecutors', 25)
        .config('spark.yarn.executor.memoryOverhead', '4096')
        .getOrCreate()
)
sc = spark.sparkContext
Implausible answered 10/5, 2019 at 12:50 Comment(1)
I am using spark 2.4.5 version if it is deprecated then why i am getting org.apache.spark.SparkException: Job aborted due to stage failure: Task 51 in stage 1218.0 failed 1 times, most recent failure: Lost task 51.0 in stage 1218.0 (TID 62209, dev1-zz-1a-10x24x96x95.dev1.grid.spgmidev.com, executor 13): ExecutorLostFailure (executor 13 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.Bubb

© 2022 - 2024 — McMap. All rights reserved.