The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?
You can simply use the stdin redirection with spark-shell
:
spark-shell < YourSparkCode.scala
This command starts a spark-shell, interprets your YourSparkCode.scala
line by line and quits at the end.
Another option is to use -I <file>
option of spark-shell
command:
spark-shell -I YourSparkCode.scala
The only difference is that the latter command leaves you inside the shell and you must issue :quit
command to close the session.
[UDP] Passing parameters
Since spark-shell
does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.
Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).
But I personally find the Spark configuration the most clean and convenient way.
Your pass your parameters via --conf
option:
spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala
(please note that spark.
prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)
And read these arguments in your Spark code as below:
val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")
It is possible via spark-submit.
https://spark.apache.org/docs/latest/submitting-applications.html
You can even put it to bash script either create sbt-task https://www.scala-sbt.org/1.x/docs/Tasks.html to run your code.
© 2022 - 2024 — McMap. All rights reserved.