Spark can use Hadoop S3A file system org.apache.hadoop.fs.s3a.S3AFileSystem
. By adding the following into the conf/spark-defaults.conf
, I can get spark-shell to log to the S3 bucket:
spark.jars.packages net.java.dev.jets3t:jets3t:0.9.0,com.google.guava:guava:16.0.1,com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.eventLog.enabled true
spark.eventLog.dir s3a://spark-logs-test/
spark.history.fs.logDirectory s3a://spark-logs-test/
spark.history.provider org.apache.hadoop.fs.s3a.S3AFileSystem
Spark History Server also loads configuration from conf/spark-defaults.conf
, but it seems not to load spark.jars.packages
configuration, and throws ClassNotFoundException
:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:256)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
The Spark source code for loading configuration is different in SparkSubmitArguments.scala and in HistoryServerArguments.scala, in particular the HistoryServerArguments does not seem to load packages.
Is there a way to add the org.apache.hadoop.fs.s3a.S3AFileSystem
dependency to the History Server?