Luigi fails to finish all tasks listed in the require method
Asked Answered
S

1

7

Say I have a task with the following dependency structure

class ParentTask(luigi.Task):
    def requires(self):
        return [ChildTask(classLevel=x) for x in self.class_level_list]
    def run(self):
        yadayda

The child task runs fine on it own. The parent correctly checks all the children tasks for finish status. Yet when the first child task finishes, the scheduler mark the parent task as finished. with the following message:

   Scheduled 15 tasks of which:
* 3 ran successfully:
    - 1 CleanRecord(...)
    - 1 EstimateQuestionParameter(classLevel=6, qdt=2016-04-19, subject=english)
    - 1 GetLog(classLevel=6, qdt=2016-04-19, subject=english)
* 12 were left pending, among these:
    * 12 were left pending because of unknown reason:
        - 5 EstimateQuestionParameter(classLevel=1...5, qdt=2016-04-19, subject=english)
        - 5 GetLog(pool=None, classLevel=1...5, qdt=2016-04-19, subject=english)
        - 1 UpdateQuestionParameter(qdt=2016-04-19, lastQdt=2016-03-23, subject=english, isInit=False)
        - 1 UpdateQuestionParameterBuffer(qdt=2016-04-19, subject=english, src_table=edw.edw_behavior_question_record_exam_new)

This progress looks :) because there were no failed tasks or missing external dependencies
Southeaster answered 21/4, 2016 at 6:42 Comment(9)
never saw this error happen... I think it'll be quite hard to know what's going on without seeing the code you're runningFitter
Do you have a suspect? I cannot post all the src code but can construct some pseudo-code that is representative of the task.Southeaster
@Southeaster Please post the relevant code or pseudo-code.Ahmad
Same thing just happened to me, did you ever find out the cause?Humpy
In my case, it turned out to be a worker that got disconnected (i.e. stopped responding to pings).Humpy
@Southeaster when do you define self.class_level_list? For me I had erratic behavior when changing variables in the run method of a task and then using that to influence outputs or requirements. Is this a property?Kathleenkathlene
@thegeebe, the class_level_list is static.Southeaster
@Mauricio Scheffer, I only has one worker.Southeaster
I also got this problem, just before it quits it shows an INFO log containing this: Worker Worker(salt=02346543, workers=2, host=ip-x-x-x-x, username=hadoop, pid=31121) was stopped. Shutting down Keep-Alive threadMoll
K
2

I think this happens because your worker gets disconnected from the scheduler. The worker's heartbeats don't reach scheduler because of network partition or, more likely, because they're never sent due to this issue.

You have two options to work-around the problem:

  • Increase worker-disconnect-delay setting ([scheduler] section in config, default 60s)
  • Use more than one worker for your job, e.g. --workers 2 (if it's the latter reason)
Kletter answered 13/7, 2017 at 13:3 Comment(1)
I add keep_alive in the worker section of the luigi config file. The problem does not reappear. I think your solution is right.Southeaster

© 2022 - 2024 — McMap. All rights reserved.