I'm trying to use Livy
to remotely submit several Spark
jobs. Lets say I want to perform following spark-submit
task remotely (with all the options as-such)
spark-submit \
--class com.company.drivers.JumboBatchPipelineDriver \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=1g \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.serializer='org.apache.spark.serializer.KryoSerializer' \
--conf "spark.executor.extraJavaOptions= -XX:+UseG1GC" \
--master yarn \
--deploy-mode cluster \
/home/hadoop/y2k-shubham/jars/jumbo-batch.jar \
\
--start=2012-12-21 \
--end=2012-12-21 \
--pipeline=db-importer \
--run-spiders
NOTE: The options after the JAR
(--start
, --end
etc.) are specific to my Spark
application. I'm using scopt
for this
I'm aware that I can supply all the various options in above
spark-submit
command usingLivy
POST/batches
request.But since I have to make over 250
spark-submit
s remotely, I'd like to exploitLivy
's session-management capabilities; i.e., I wantLivy
to create aSparkSession
once and then use it for all myspark-submit
requests.The
POST/sessions
request allows me to specify quite a few options for instantiating aSparkSession
remotely. However, I see nosession
argument inPOST/batches
request.
How can I make use of the SparkSession
that I created using POST/sessions
request for submitting my Spark
job using POST/batches
request?
I've referred to following examples but they only demonstrate supplying (python
) code for Spark
job within Livy
's POST
request