Triggering spark jobs with REST

A

5

27

I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.

I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job. I have now few design options.

Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .

 /*Can this Code be abstracted from the application and written as 
  as a seperate job. Because my understanding is that the 
 Application code itself has to have the addJars embedded 
 which internally  sparkContext takes care.*/

 SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(
 new String[] { "/path/to/jar/submit/cluster" })
 .setMaster("/url/of/master/node");
  sparkConf.setSparkHome("/path/to/spark/");

        sparkConf.set("spark.scheduler.mode", "FAIR");
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        sc.setLocalProperty("spark.scheduler.pool", "test");

    // Application with Algorithm , transformations

extending above point have multiple versions of jobs handled by service.
Or else use an Spark Job Server to do this.

Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.

Note : I am using a standalone cluster from spark. kindly help.

Auraaural answered 11/3, 2015 at 16:59 Comment(2)

I added the Spring for Apache Hadoop tag to this question. Spring Batch Admin provides a REST API for managing and launching jobs and I believe Spring for Apache Hadoop provides the ability to launch Spark jobs from Spring Batch... – Moreta 11/3, 2015 at 18:25

@MichaelMinella : thank you for the suggestion, I will definitely look into it. – Auraaural 12/3, 2015 at 5:8

S

7

Just use the Spark JobServer https://github.com/spark-jobserver/spark-jobserver

There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch

Schechter answered 11/3, 2015 at 21:5 Comment(3)

Also consider before using Spark Job Server - it doesn't support Spark newer than 2.0. Moreover, looking at their commit history - it's not super active – Phellem 16/5, 2017 at 9:42

@VolodymyrBakhmatiuk it's more active than apache livy though. – Mclendon 19/3, 2018 at 15:18

Spark Job Server has supported Spark 2.2 for a while, now. – Protomorphic 19/3, 2018 at 18:6

A

27

It turns out Spark has a hidden REST API to submit a job, check status and kill.

Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

Addict answered 8/10, 2015 at 20:14 Comment(6)

Sounds really interesting, found this issues.apache.org/jira/secure/attachment/12696651/… so its means spark itself has now exposed this feature? – Auraaural 12/10, 2015 at 9:9

Afaik they have added it from v1.4. But they are not yet publicly promoting yet. – Addict 30/10, 2015 at 18:27

@ArturMkrtchyan relly interesting option, thank you! What happens if I will submit two applications simultaneously through Spark REST API? – Phellem 16/5, 2017 at 9:40

The webpage you have linked does not really tell anything because pictures on the page are dead. – Mclendon 19/3, 2018 at 13:59

This one might help while the main link provided has broken pictures: gist.github.com/arturmkrtchyan/5d8559b2911ac951d34a – Acotyledon 16/7, 2018 at 11:26

does it launch a spark session/context every-time there is a call to the rest api, or it uses the same session ? – Footboard 12/2, 2019 at 13:7

S

7

Just use the Spark JobServer https://github.com/spark-jobserver/spark-jobserver

There are a lot of things to consider with making a service, and the Spark JobServer has most of them covered already. If you find things that aren't good enough, it should be easy to make a request and add code to their system rather than reinventing it from scratch

Schechter answered 11/3, 2015 at 21:5 Comment(3)

Also consider before using Spark Job Server - it doesn't support Spark newer than 2.0. Moreover, looking at their commit history - it's not super active – Phellem 16/5, 2017 at 9:42

@VolodymyrBakhmatiuk it's more active than apache livy though. – Mclendon 19/3, 2018 at 15:18

Spark Job Server has supported Spark 2.2 for a while, now. – Protomorphic 19/3, 2018 at 18:6

S

5

Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

Shortcut answered 24/3, 2017 at 23:19 Comment(3)

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review – Prototherian 19/3, 2018 at 13:41

You are right, I have updated my answer giving a little bit more details. Thanks. – Shortcut 25/3, 2018 at 9:41

Livy release cycle is weird. They release almost like once a year! – Somnambulism 12/1, 2019 at 16:45

K

1

Here is a good client that you might find helpful: https://github.com/ywilkof/spark-jobs-rest-client

Edit: this answer was given in 2015. There are options like Livy available now.

Keegan answered 16/11, 2015 at 14:22 Comment(2)

don't you know whether it's possible to laucnh two applications simulateneously through that client? – Phellem 16/5, 2017 at 9:44

Yes, it's possible. The client is just a wrapper around HTTP calls to your spark master. So if your setup can handle that then it will be possible. – Seymour 24/8, 2017 at 6:17

P

0

Even I had this requirement I could do it using Livy Server, as one of the contributor Josemy mentioned. Following are the steps I took, hope it helps somebody:

Download livy zip from https://livy.apache.org/download/
Follow instructions:  https://livy.apache.org/get-started/


Upload the zip to a client.
Unzip the file
Check for the following two parameters if doesn't exists, create with right path
export SPARK_HOME=/opt/spark
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

Enable 8998 port on the client

Update $LIVY_HOME/conf/livy.conf with master details any other stuff needed
Note: Template are there in $LIVY_HOME/conf
Eg. livy.file.local-dir-whitelist = /home/folder-where-the-jar-will-be-kept/


Run the server
$LIVY_HOME/bin/livy-server start

Stop the server
$LIVY_HOME/bin/livy-server stop

UI: <client-ip>:8998/ui/

Submitting job:POST : http://<your client ip goes here>:8998/batches
{
  "className" :  "<ur class name will come here with package name>",
  "file"  : "your jar location",
  "args" : ["arg1", "arg2", "arg3" ]

}

Pola answered 31/1, 2020 at 6:47 Comment(0)

Recommended topics

Hot tags