TL;DR
- Create a new intellij configuration > add a "JAR Application"
- Fill in the "Configuration" section as needed but specifically the "path to jar" should be the file path to where the fat jar will be generated
- At the bottom in the "Before Launch" > create a "run external tool" where you call the
sbt assembly
command line tool (see step 13 below for details)
Spark + fat jar setup
I found the "SBT task" to be useless in getting sbt-assembly to work within intellij as others suggested above. Rather I had to create a custom configuration calling sbt assembly
on the command line.
My use case was specifically around Scala 2.12.18 + Apache Spark v3.2.1 running in stand alone cluster mode (not yarn nor mesos) on my local network + uber/fat jars within Intellij 2023.3.3 Ultimate for MacOS.
The setup is a bit convoluted but the result is you can simply hit the play/run button and it'll compile your code to a fat jar and deploy it to the spark cluster using spark-submit with 1 click.
As an alternative as others have suggested elsewhere, you can just keep running the sbt assembly
and spark-submit
on the command line manually after each code change but this gets tedious after awhile IMO.
- open IntelliJ > Preferences > (left pane) Plugins > find the scala and spark plugins and install, restart IntellJ
- File > New > Project... > in left pane "Generators" pick Spark then set name=mysparkapp, Type=Batch, Language=Scala,build system=SBT, JDK = 1.8, scala=2.12.18, sbt=1.9.6
- Let the generator run and once finished, you'll need to create all the .sbt etc config files to make sbt assembly work within your project. There is a good write up here. After your done with this, open the stubbed out "SparkPi" scala.
- In this file, you'll notice a little Spark icon (star) in the gutter (left side where you put break points), click it and choose "create spark submit configuration"
- At the top, under "Remote target" dropdown, click and choose "Add custom spark cluster". Run thru this wizard setting up an SSH connection to transfer files. We're not really using this piece because we're deploying fat jars to the cluster but it still seems to require you to set it up.
- In the application text box, paste the path to where the uber jar will be generated, for example
/Users/me/sparkstuff/mysparkapp/target/scala-2.12/mysparkapp-assembly-1.0.jar
- In class textbox, put the class you want to run (that has the def main()) like com.example.mysparkapp.SparkPi
- Click the "Additional customization" > check the sections "Spark Configuration", "Dependencies", "Driver" and "Executor". This is basically just showing sections to fill in config settings that are passed to
spark-submit
- Under Spark configuration > cluster manager=standalone, deploy mode=client (so you can see output when your app runs), spark home=path to spark install folder
- under "Spark Debug" section, I unchecked "start spark driver with debug agent"
- Fill in Dependencies, Driver and Executor sections as needed (or leave them blank)
- At the bottom in the "Result Submit Command" you can click the expand arrows and see the resulting command line call to invoke
spark-submit
.
- Now the
sbt assembly
bit: at the very bottom in the "Before Launch" list, delete everything.
- click the + and choose "run external tool" and + again
- name it "sbt assembly" and under Tool Settings, Program=, Arguments=assembly, Working directory=$ProjectFileDir$ and check all the boxes at the bottom ("make console active..." etc) > click OK > OK
- Then click the Apply button at the bottom of the "Run/Debug Configurations" window and OK button to close. Note it may re-add the "upload files through SFTP" bit but you can ignore it. Again referring to step 5 above, this SSH part must be filled out for some reason even when its not needed.
- Now you should be able to click the top right play button (green triangle) and it should compile your code into a fat jar (via
sbt assembly
) and launch it on the spark cluster. All the activity should show in the Run terminal windows.
ctrl + shift + f10
to make a build. Is there something similar for sbt? – Pedo