Pass system property to spark-submit and read file from classpath or custom path
Asked Answered
O

1

10

I have recently found a way to use logback instead of log4j in Apache Spark (both for local use and spark-submit). However, there is last piece missing.

The issue is that Spark tries very hard not to see logback.xml settings in its classpath. I have already found a way to load it during local execution:

What I have so far

Basically, checking for System property logback.configurationFile, but loading logback.xml from my /src/main/resources/ just in case:

// the same as default: https://logback.qos.ch/manual/configuration.html
private val LogbackLocation = Option(System.getProperty("logback.configurationFile"))
// add some default logback.xml to your /src/main/resources
private lazy val defaultLogbackConf = getClass.getResource("/logback.xml").getPath

private def getLogbackConfigPath = {
   val path = LogbackLocation.map(new File(_).getPath).getOrElse(defaultLogbackConf)
   logger.info(s"Loading logging configuration from: $path")
   path
}

And then when I initialize my SparkContext...

val sc = SparkContext.getOrCreate(conf)
sc.addFile(getLogbackConfigPath)

I can confirm it works locally.

Playing with spark-submit

spark-submit \
  ...
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

This gives an error:

Exception in thread "main" java.io.FileNotFoundException: Added file file:/path/to/my/application-fat.jar!/logback.xml does not exist

Which I think is nonsense, because first the application, finds the file (according to my code)

getClass.getResource("/logback.xml").getPath

and then, during

sc.addFile(getLogbackConfigPath)

it turns out... whoa! no file there!? What the heck!? Why would it not find the file inside the jar. It obviously is there, I did triple checked it.

Another approach to spark-submit

So I thought, OK. I will pass my file, as I could specify the system property. I put the logback.xml file next to my application-fat.jar and:

spark-submit \
  ...
  --conf spark.driver.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --conf spark.executor.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

And I get the same error as above. So my setting is completely ignored! Why? How to specify

-Dlogback.configurationFile

properly and pass it as properly to driver and executors?

Thanks!

Overspend answered 3/8, 2017 at 17:17 Comment(2)
Possible duplicate of How to pass -D parameter or environment variable to Spark job?Phosphine
@AlexK the questions was much more broaderOverspend
O
18

1. Solving java.io.FileNotFoundException

This is probably unsolvable.

Simply, SparkContext.addFile can not read the file from inside the Jar. I believe it is treated as it was in some zip or alike.

Fine.

2. Passing -Dlogback.configurationFile

This was not working due to my misunderstanding of the configuration parameters.

Because I am using --master yarn parameter, but I do not specify --deploy-mode to cluster it is by default client.

Reading https://spark.apache.org/docs/1.6.1/configuration.html#application-properties

spark.driver.extraJavaOptions

Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.

So passing this setting with --driver-java-options worked:

spark-submit \
  ...
  --driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

Note about --driver-java-options

In contrast to --conf multiple parameters have to be passed as one parameter, example:

--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml -Dother.setting=value" \

And the following will not work

--driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
--driver-java-options "-Dother.setting=value" \
Overspend answered 7/8, 2017 at 10:30 Comment(5)
Marking up both your question and self answer - though it's not a clear question/answer, it covers quite a few traps Spark developers hit. Also appreciate that you came back in to answer even though you don't have a working solution.Nystrom
@DavidLevy thanks for your comment. Indeed, when I came back to this "post" now it could be easily divided into 2/3 questions, but at least now it gives full picture of the case. However, maybe it is not clear from the answer, I did find solution to my problem - providing the external configuration file with System property and loading it. The file must not be in the JAR though.Overspend
If you use the --files /path/to/my/logback.xml parameter it should take care of the exception from #1 in your answer, since it places the file into the work directory of your application in the cluster.Manakin
Why we have to do this --driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml". It works but why my spark program does not read and apply the logback.xml from resources folder and I need to give it explicitly using driver-java-options.Giving it as driver-java-options works but is not elegant as you need to change the file on all the slave nodes when you want to modify the log settings. It should be in the code. Has anyone found a proper way of doing this ?Carioca
I am not sure if I understand what you want to achieve but I think it is not what I needed. I suggest making your own question and being more precise.Overspend

© 2022 - 2024 — McMap. All rights reserved.