Triggering spark jobs with REST
Asked Answered
A

5

27

I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.

I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job. I have now few design options.

  • Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .

     /*Can this Code be abstracted from the application and written as 
      as a seperate job. Because my understanding is that the 
     Application code itself has to have the addJars embedded 
     which internally  sparkContext takes care.*/
    
     SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(
     new String[] { "/path/to/jar/submit/cluster" })
     .setMaster("/url/of/master/node");
      sparkConf.setSparkHome("/path/to/spark/");
    
            sparkConf.set("spark.scheduler.mode", "FAIR");
            JavaSparkContext sc = new JavaSparkContext(sparkConf);
            sc.setLocalProperty("spark.scheduler.pool", "test");
    
        // Application with Algorithm , transformations
    
  • extending above point have multiple versions of jobs handled by service.

  • Or else use an Spark Job Server to do this.

Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.

Note : I am using a standalone cluster from spark. kindly help.

Auraaural answered 11/3, 2015 at 16:59 Comment(2)
I added the Spring for Apache Hadoop tag to this question. Spring Batch Admin provides a REST API for managing and launching jobs and I believe Spring for Apache Hadoop provides the ability to launch Spark jobs from Spring Batch...Moreta
@MichaelMinella : thank you for the suggestion, I will definitely look into it.Auraaural
S
7

Just use the Spark JobServer https://github.com/spark-jobserver/spark-jobserver

There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch

Schechter answered 11/3, 2015 at 21:5 Comment(3)
Also consider before using Spark Job Server - it doesn't support Spark newer than 2.0. Moreover, looking at their commit history - it's not super activePhellem
@VolodymyrBakhmatiuk it's more active than apache livy though.Mclendon
Spark Job Server has supported Spark 2.2 for a while, now.Protomorphic
A
27

It turns out Spark has a hidden REST API to submit a job, check status and kill.

Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

Addict answered 8/10, 2015 at 20:14 Comment(6)
Sounds really interesting, found this issues.apache.org/jira/secure/attachment/12696651/… so its means spark itself has now exposed this feature?Auraaural
Afaik they have added it from v1.4. But they are not yet publicly promoting yet.Addict
@ArturMkrtchyan relly interesting option, thank you! What happens if I will submit two applications simultaneously through Spark REST API?Phellem
The webpage you have linked does not really tell anything because pictures on the page are dead.Mclendon
This one might help while the main link provided has broken pictures: gist.github.com/arturmkrtchyan/5d8559b2911ac951d34aAcotyledon
does it launch a spark session/context every-time there is a call to the rest api, or it uses the same session ?Footboard
S
7

Just use the Spark JobServer https://github.com/spark-jobserver/spark-jobserver

There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch

Schechter answered 11/3, 2015 at 21:5 Comment(3)
Also consider before using Spark Job Server - it doesn't support Spark newer than 2.0. Moreover, looking at their commit history - it's not super activePhellem
@VolodymyrBakhmatiuk it's more active than apache livy though.Mclendon
Spark Job Server has supported Spark 2.2 for a while, now.Protomorphic
S
5

Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

Shortcut answered 24/3, 2017 at 23:19 Comment(3)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewPrototherian
You are right, I have updated my answer giving a little bit more details. Thanks.Shortcut
Livy release cycle is weird. They release almost like once a year!Somnambulism
K
1

Here is a good client that you might find helpful: https://github.com/ywilkof/spark-jobs-rest-client

Edit: this answer was given in 2015. There are options like Livy available now.

Keegan answered 16/11, 2015 at 14:22 Comment(2)
don't you know whether it's possible to laucnh two applications simulateneously through that client?Phellem
Yes, it's possible. The client is just a wrapper around HTTP calls to your spark master. So if your setup can handle that then it will be possible.Seymour
P
0

Even I had this requirement I could do it using Livy Server, as one of the contributor Josemy mentioned. Following are the steps I took, hope it helps somebody:

Download livy zip from https://livy.apache.org/download/
Follow instructions:  https://livy.apache.org/get-started/


Upload the zip to a client.
Unzip the file
Check for the following two parameters if doesn't exists, create with right path
export SPARK_HOME=/opt/spark
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

Enable 8998 port on the client

Update $LIVY_HOME/conf/livy.conf with master details any other stuff needed
Note: Template are there in $LIVY_HOME/conf
Eg. livy.file.local-dir-whitelist = /home/folder-where-the-jar-will-be-kept/


Run the server
$LIVY_HOME/bin/livy-server start

Stop the server
$LIVY_HOME/bin/livy-server stop

UI: <client-ip>:8998/ui/

Submitting job:POST : http://<your client ip goes here>:8998/batches
{
  "className" :  "<ur class name will come here with package name>",
  "file"  : "your jar location",
  "args" : ["arg1", "arg2", "arg3" ]

}
Pola answered 31/1, 2020 at 6:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.