Persist Completed Pipeline in Luigi Visualiser
Asked Answered
V

2

6

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of the nodes disappear from the graph except for MasterEnd. This is a little inconvenient, as I'd like to see that everything is complete for the day/past days.

Further, if in the visualiser I go directly to the last job's URL, it can't find any history that it ran: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/). I have verified that it ran successfully this morning.

One thing to note is that I have a cron that runs this pipeline every 15 minutes to check for a file on S3. If it exists, it runs, otherwise it stops. I'm not sure if that is causing the removal of tasks from the visualiser or not. I've noticed it generates a new PID every run, but I couldn't find a way to persist one PID/day in the docs.

So, my questions: Is it possible to persist the completed graph for the current day in the visualiser? And is there a way to see what has happened in the past?

Appreciate all the help

Viera answered 17/9, 2015 at 17:7 Comment(1)
Does this also work if we have "remove-delay = 86400" under scheduler - client.cfg? I have added that parameter to client.cfg but still I dont see jobs for one day!Fogy
N
4

I'm not 100% positive if this is correct, but this is what I would try first. When you call luigi.run, pass it --scheduler-remove-delay. I'm guessing this is how long the scheduler waits before forgetting a task after all of its dependents have completed. If you look through luigi's source, the default is 600 seconds. For example:

luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name)
Nona answered 29/9, 2015 at 15:40 Comment(2)
Sorry for the delay on accepting this. It was indeed what I needed and has been a tremendous helpViera
Does this also work if we have "remove-delay = 86400" under scheduler - client.cfg? I have added that parameter to client.cfg but still I dont see jobs for one day!Fogy
D
2

If you configure the remove_delay setting in your luigi.cfg then it will keep the tasks around for longer.

[scheduler]
record_task_history = True
state_path = /x/s/hadoop/luigi/var/luigi-state.pickle
remove_delay = 86400

Note, there is a typo in the documentation ("remove-delay" instead of remove_delay") which is being fixed under https://github.com/spotify/luigi/issues/2133

Dahl answered 22/6, 2017 at 20:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.