Using G1GC garbage collector with spark 2.3
Asked Answered
M

3

8

I am trying to use the G1GC garbage collector for spark job but I get a

Error: Invalid argument to --conf: -XX:+UseG1GC

I tried using these options but haven't been able to get it working

spark-submit --master spark://192.168.60.20:7077 --conf -XX:+UseG1GC /appdata/bblite-codebase/test.py

and

spark-submit --master spark://192.168.60.20:7077 -XX:+UseG1GC /appdata/bblite-codebase/test.py

What is the correct way to call a G1GC collector from spark?

Meyers answered 14/6, 2018 at 11:21 Comment(0)
M
11

JVM options should be passed as spark.executor.extraJavaOptions / spark.driver.extraJavaOptions, ie.

 --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC"
Matamoros answered 14/6, 2018 at 12:6 Comment(0)
M
4

This is how you can configure garbage collection setting in both driver and executor.

spark-submit --master spark://192.168.60.20:7077 \
 --conf "spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
 --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
 /appdata/bblite-codebase/test.py
Macklin answered 14/6, 2018 at 14:21 Comment(0)
T
1

Starting with Spark 2.4.3, this will not work for the driver extraJavaOptions, which will produce an error of

Conflicting collector combinations in option list; please refer to the release notes for the combinations allowed

This is because the default spark-defaults.conf includes

spark.executor.defaultJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70
spark.driver.defaultJavaOptions  -XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled

which already includes a GC setting, and setting two GC options causes it to complain with this error. So you may need:

--conf "spark.executor.defaultJavaOptions=-XX:+UseG1GC"
--conf "spark.driver.defaultJavaOptions=-XX:+UseG1GC"

and also adding other defaults you'd like to propagate over.

Alternatively, you can edit the defaults in spark-defaults.conf to remove the GC defaults for driver/executor and force it to be specified in extraJavaOptions, depending on your use cases.

Theadora answered 13/9, 2021 at 23:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.