Set hadoop configuration values on spark-submit command line
Asked Answered
N

1

15

We want to set the aws parameters that from code would be done via the SparkContext:

sc.hadoopConfiguration.set("fs.s3a.access.key", vault.user)
sc.hadoopConfiguration.set("fs.s3a.secret.key", vault.key)

However we have a custom Spark launcher framework that requires all the custom Spark configurations to be done via --conf parameters to the spark-submit command line.

Is there a way to "notify" the SparkContext to set --conf values to the hadoopConfiguration and not to its general SparkConf ? Looking for something along the lines of

spark-submit --conf hadoop.fs.s3a.access.key $vault.user --conf hadoop.fs.s3a.access.key $vault.key

or

spark-submit --conf hadoopConfiguration.fs.s3a.access.key $vault.user --conf hadoopConfiguration.fs.s3a.access.key $vault.key
Narrative answered 14/3, 2017 at 21:7 Comment(2)
spark.hadoop.fs.s3a.access.key=valueOvum
@Ovum -Yes! i was trying to remember that . Please add as answerNarrative
O
44

You need to prefix Hadoop configs with spark.hadoop. in the command line (or SparkConf object). For example:

spark.hadoop.fs.s3a.access.key=value
Ovum answered 14/3, 2017 at 21:37 Comment(3)
Yay! I was looking for it! It's working! That's what SO is for (-:Go
And this ends 2 days of searching. Thank you!Wig
they do not recommend using s3a:// anymore docs.aws.amazon.com/emr/latest/ManagementGuide/…. Do you know how to setup to use s3://?Ruffin

© 2022 - 2024 — McMap. All rights reserved.