I am using Luigi for my workflow. My workflow is divided into three general parts - import, analysis, export. Within each part, there are multiple Luigi tasks.
I could have everything in a single file. But if I want to keep everything separate, as in having data_import.py
, analysis.py
, and export.py
.
For example, if data_import.py
looks like:
import luigi
class import_task_A(luigi.Task):
def requires(self):
return []
def output(self):
return luigi.LocalTarget('myfile.txt')
def run(self):
my import stuff
if __name__ == '__main__':
luigi.run()
But what if a task in export.py depends on a task in import.py. Would I do:
from data_import import import_task_A
import luigi
class export_task_A(luigi.Task):
def requires(self):
return import_task_A()
def output(self):
return luigi.LocalTarget('myfile.txt')
def run(self):
my import stuff
if __name__ == '__main__':
luigi.run()
If I have larger projects broken up into multiple .py
files, what is the best way to tell Luigi which required tasks are in which file? Seems like this method would become cumbersome.