Is it possible to run a Spark Scala script without going inside spark-shell?
Asked Answered
W

2

7

The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?

Whereat answered 21/2, 2020 at 15:23 Comment(0)
R
7

You can simply use the stdin redirection with spark-shell:

spark-shell < YourSparkCode.scala

This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.

Another option is to use -I <file> option of spark-shell command:

spark-shell -I YourSparkCode.scala

The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.

[UDP] Passing parameters

Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.

Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).

But I personally find the Spark configuration the most clean and convenient way.

Your pass your parameters via --conf option:

spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala

(please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)

And read these arguments in your Spark code as below:

val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")
Rote answered 21/2, 2020 at 17:10 Comment(2)
But is there any way to pass command line arguments like that?Whereat
@Whereat good question. I don't know any really elegant way to parameterize your code. I see at least 2 options here. First one - to have placeholders in your source file and replace them before you sent the content to the spark-shell. The second one is to pass your parameters as a spark config properties. I will update my answer.Rote
S
0

It is possible via spark-submit.

https://spark.apache.org/docs/latest/submitting-applications.html

You can even put it to bash script either create sbt-task https://www.scala-sbt.org/1.x/docs/Tasks.html to run your code.

Sharkey answered 21/2, 2020 at 15:32 Comment(1)
Where does it say you can run a Scala file through spark-submit? As far as I know you can only submit a compiled jar file with spark-submit.Whereat

© 2022 - 2024 — McMap. All rights reserved.