Organizing files when using Luigi pipeline?
Asked Answered
E

2

6

I am using Luigi for my workflow. My workflow is divided into three general parts - import, analysis, export. Within each part, there are multiple Luigi tasks.

I could have everything in a single file. But if I want to keep everything separate, as in having data_import.py, analysis.py, and export.py.

For example, if data_import.py looks like:

import luigi

class import_task_A(luigi.Task):
    def requires(self):
        return []
    def output(self):
        return luigi.LocalTarget('myfile.txt')
    def run(self):
        my import stuff

if __name__ == '__main__':
    luigi.run()

But what if a task in export.py depends on a task in import.py. Would I do:

from data_import import import_task_A
import luigi

class export_task_A(luigi.Task):
    def requires(self):
        return import_task_A()
    def output(self):
        return luigi.LocalTarget('myfile.txt')
    def run(self):
        my import stuff

if __name__ == '__main__':
    luigi.run()

If I have larger projects broken up into multiple .py files, what is the best way to tell Luigi which required tasks are in which file? Seems like this method would become cumbersome.

Evans answered 20/10, 2016 at 18:8 Comment(2)
The way you are doing it seems fine.Brickkiln
Why would it become cumbersome?Laraelaraine
A
0

Why would it become cumbersome? If your export_task_A depends on many tasks, your def requires will change to:

def requires(self):
    return [import_task_A(), import_task_B()]

By the way in that case you may want to remove

if __name__ == '__main__':
    luigi.run()

from your data_import.py. Also instead of same in data_export.py use

if __name__ == '__main__':
    luigi.build([export_task_A()])
Anastatius answered 17/8, 2018 at 17:48 Comment(0)
O
0

Not sure there's a way around this. You either need to have many files, or many classes in one file. It's a matter of preference how you'd like to organize your project.

One thing you could do to limit the number of locations that you import from is to have one python file that imports all the Luigi classes you need

# my_tasks.py
from data_import import import_task_A
from export import export_task_A

Then in other files, you can import whatever you need from my_tasks. Also consider using getattr or importlib for more flexibility on importing and accessing classes.

Oven answered 19/4, 2019 at 19:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.