I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.
I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job. I have now few design options.
Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be
JobScheduler
./*Can this Code be abstracted from the application and written as as a seperate job. Because my understanding is that the Application code itself has to have the addJars embedded which internally sparkContext takes care.*/ SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars( new String[] { "/path/to/jar/submit/cluster" }) .setMaster("/url/of/master/node"); sparkConf.setSparkHome("/path/to/spark/"); sparkConf.set("spark.scheduler.mode", "FAIR"); JavaSparkContext sc = new JavaSparkContext(sparkConf); sc.setLocalProperty("spark.scheduler.pool", "test"); // Application with Algorithm , transformations
extending above point have multiple versions of jobs handled by service.
Or else use an Spark Job Server to do this.
Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.
Note : I am using a standalone cluster from spark. kindly help.