Spark configuration change in runtime
Asked Answered
H

2

12

So asking if anyone knows a way to change the Spark properties (e.g. spark.executor.memory, spark.shuffle.spill.compress, etc) during runtime, so that a change may take effect between the tasks/stages during a job...

So I know that...

1) The documentation for Spark 2.0+ (and previous versions too) state that once the Spark Context has been created, it can't be changed in runtime.

2) SparkSession.conf.set that may change a few things for SQL, but I was looking at more general, all encompassing configurations.

3) I could start a new context in the program with new properties, but the case here is to actually tune the properties once a job is already executing.

Ideas...

1) Would killing an Executor force it to read a configuration file again, or does it just get what's already configured during the beginning of the job?

2) Is there any command to force a "refresh" of the properties in spark context?

So hoping there might be a way or other ideas out there (thanks in advance)...

Hued answered 30/9, 2016 at 17:5 Comment(1)
One example of where this might be useful: a small value of spark.locality.wait may be appropriate for one stage processing a small amount of data, whereas a later stage processing a large amount of data should probably use a larger value for this parameter.Aerophone
B
4

After submitting the Spark application, we can change a few parameter values at Runtime and a few not.

By using spark.conf.isModifiable() method, we can check parameter value we can modify at runtime or not. If the value returns true then we can modify the parameter value otherwise, we can't modify the value at runtime.

Examples:

>>> spark.conf.isModifiable("spark.executor.memory")
False 
>>> spark.conf.isModifiable("spark.sql.shuffle.partitions")
True

So based on the above testing, we can't modify the spark.executor.memory parameter value at runtime.

Bayne answered 23/9, 2022 at 15:51 Comment(0)
V
0

No, it is not possible to change settings like spark.executor.memory at runtime.


In addition, there are probably not too many great tricks in the direction of 'quickly switching to a new context' as the strength of spark is that it can pick up data and keep going. What you essentially are asking for is a map-reduce framework. Of course you could rewrite your job into this structure, and divide the work across multiple spark jobs, but then you would lose some of the ease and performance that spark brings. (Though possibly not all).

If you really think the request makes sense on a conceptual level, you could consider making a feature request. This can be through your spark supplier, or directly by logging a Jira on the apache Spark project.

Vengeful answered 1/8, 2020 at 21:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.