I have a luigi python task which includes some pyspark libs. Now I would like to submit this task on mesos with spark-submit. What should I do to run it? Below is my code skeleton:
from pyspark.sql import functions as F
from pyspark import SparkContext
class myClass(SparkSubmitTask):
# date = luigi.DateParameter()
def __init__(self, date):
self.date = date # date is datetime.date.today().isoformat()
def output(self):
def input(self):
def run(self):
# Some functions are using pyspark libs
if __name__ == "__main__":
luigi.run()
Without luigi, I'm submmitting this task as the following command-line:
/opt/spark/bin/spark-submit --master mesos://host:port --deploy-mode cluster --total-executor-cores 1 --driver-cores 1 --executor-memory 1G --driver-memory 1G my_module.py
Now the problem is how I can spark-submit the luigi task that includes luigi command-line such as:
luigi --module my_module myClass --local-scheduler --date 2016-01
One more question is if my_module.py has a required task to finish first, do I need to do something more for it or just set the same as the current command-line?
I really appreciate for any hints or suggestions for this. Thanks very much.