python rq - how to trigger a job when multiple other jobs are finished? Multi job dependency work arround?

Asked 24/3, 2018 at 20:24 Answered 29/7, 2020 at 20:17

Solved python python-3.x redis queue python-rq

I have a nested job structure in my python redis queue. First the rncopy job is executed. Once this is finished the 3 dependant registration jobs follow. When the computation of all these 3 jobs is finished I want to trigger a job to send a websocket notification to my frontend.

My current try:

    rncopy = redisqueue.enqueue(raw_nifti_copymachine, patientid, imagepath, timeout=6000)
    t1c_reg = redisqueue.enqueue(modality_registrator, patientid, "t1c", timeout=6000, depends_on=rncopy)
    t2_reg = redisqueue.enqueue(modality_registrator, patientid, "t2", timeout=6000, depends_on=rncopy)
    fla_reg = redisqueue.enqueue(modality_registrator, patientid, "fla", timeout=6000, depends_on=rncopy)
    notify = redisqueue.enqueue(print, patient_finished, patientid, timeout=6000, depends_on=(t1c_reg, t2_reg, fla_reg))

Unfortunately it seems that multi job dependency feature was never merged into the master. I saw that there currently two pull requests on git. Is there a workaround which I can use?

Sorry for failing to provide a reproducible example.

Psilomelane answered 24/3, 2018 at 20:24 Comment(0)

New versions (RQ >= 1.8)

You can simply use depends_on parameters, passing a list or a tuple.

rncopy = redisqueue.enqueue(raw_nifti_copymachine, patientid, imagepath, timeout=6000)
t1c_reg = redisqueue.enqueue(modality_registrator, patientid, "t1c", timeout=6000, depends_on=rncopy)
t2_reg = redisqueue.enqueue(modality_registrator, patientid, "t2", timeout=6000, depends_on=rncopy)
fla_reg = redisqueue.enqueue(modality_registrator, patientid, "fla", timeout=6000, depends_on=rncopy)

notify = redisqueue.enqueue(first_wrapper, patient_finished, patientid,t2_reg.id,fla_reg.id, timeout=6000, depends_on=(t1c_reg, t2_reg, fla_reg))

# you can also use a list instead of a tuple:
# notify = redisqueue.enqueue(first_wrapper, patient_finished, patientid,t2_reg.id,fla_reg.id, timeout=6000, depends_on=[t1c_reg, t2_reg, fla_reg])

Old versions (RQ < 1.8)

I use this workaround: if the dependencies are n, I create n-1 wrappers of the real function: each wrapper depends on a different job.

This solution is a bit involute , but it works.

rncopy = redisqueue.enqueue(raw_nifti_copymachine, patientid, imagepath, timeout=6000)
t1c_reg = redisqueue.enqueue(modality_registrator, patientid, "t1c", timeout=6000, depends_on=rncopy)
t2_reg = redisqueue.enqueue(modality_registrator, patientid, "t2", timeout=6000, depends_on=rncopy)
fla_reg = redisqueue.enqueue(modality_registrator, patientid, "fla", timeout=6000, depends_on=rncopy)

notify = redisqueue.enqueue(first_wrapper, patient_finished, patientid,t2_reg.id,fla_reg.id, timeout=6000, depends_on=t1c_reg)

def first_wrapper(patient_finished, patientid,t2_reg_id,fla_reg_id):
    queue = Queue('YOUR-QUEUE-NAME'))
    queue.enqueue(second_wrapper, patient_finished, patientid, fla_reg_id, timeout=6000, depends_on=t2_reg_id)

def second_wrapper(patient_finished, patientid,fla_reg_id):
    queue = Queue('YOUR-QUEUE-NAME'))
    queue.enqueue(print, patient_finished, patientid, timeout=6000, depends_on=fla_reg_id)

Some caveats:

I don't pass the queue object to the wrappers, because some serialization problems occur; so, the queue must be recovered by name...
For the same reason, I pass the job.id (instead of job object) to the wrappers.

Reyna answered 10/5, 2020 at 13:27 Comment(1)

interesting approach..I hope rq gets multi job dependency soon! adapting your solution would result in a lot of messy code for me as my dependencies are pretty nested. – Psilomelane 15/7, 2020 at 19:26

I created an "rq-manager" to solve similar problems with multiple and tree like dependency: https://github.com/crispyDyne/rq-manager

A project structure with multiple dependency looks like this.

def simpleTask(x):
    return 2*x

project = {'jobs':[
            {
                'blocking':True, # this job must finished before moving on.
                'func':simpleTask,'args': 0
            },
            {
                'blocking':True, # this job, and its child jobs, must finished before moving on.
                'jobs':[ # these child jobs will run in parallel
                    {'func':simpleTask,'args': 1},
                    {'func':simpleTask,'args': 2},
                    {'func':simpleTask,'args': 3}],
            },
            { # this job will only run when the blocking jobs above finish.
                'func':simpleTask,'args': 4
            }
        ]}

Then pass it to the manager to complete.

from rq_manager import manager, getProjectResults

managerJob = q.enqueue(manager,project)
projectResults = getProjectResults(managerJob)

returns

projectResults = [0, [2, 4, 6], 8]

When dependent jobs require results from the parent. I create a function that executes the first job, then adds additional jobs to the project. So for your example:

def firstTask(patientid,imagepath):

    raw_nifti_result  = raw_nifti_copymachine(patientid,imagepath)

    moreTasks = {'jobs':[
        {'func':modality_registrator,'args':(patientid, "t1c", raw_nifti_result)},
        {'func':modality_registrator,'args':(patientid, "t2", raw_nifti_result)},
        {'func':modality_registrator,'args':(patientid, "fla", raw_nifti_result)},
    ]}

    # returning a dictionary with an "addJobs" will add those tasks to the project. 
    return {'result':raw_nifti_result, 'addJobs':moreTasks}

The project would look like this:

project = {'jobs':[
            {'blocking':True, # this job, and its child jobs, must finished before moving on.
             'jobs':[
                {
                    'func':firstTask, 'args':(patientid, imagepath)
                    'blocking':True, # this job must finished before moving on.
                },
                # "moreTasks" will be added here
                ]
            }
            { # this job will only run when the blocking jobs above finish.
                'func':print,'args': (patient_finished, patientid)
            }
        ]}

If the final job needs the results from the previous jobs, then set the"previousJobArgs" flag. "finalJob" will receive an array of the previous results with a nested array of its sub job results.

def finalJob(previousResults):
    # previousResults = [ 
    #     raw_nifti_copymachine(patientid,imagepath),
    #     [
    #         modality_registrator(patientid, "t1c", raw_nifti_result),
    #         modality_registrator(patientid, "t2", raw_nifti_result),
    #         modality_registrator(patientid, "fla", raw_nifti_result),
    #     ]
    # ]
    return doSomethingWith(previousResults)

Then the project would look like this

project = {'jobs':[
            {
             #'blocking':True, # Blocking not needed.
             'jobs':[
                {
                    'func':firstTask, 'args':(patientid, imagepath)
                    'blocking':True, # this job must finished before moving on.
                },
                # "moreTasks" will be added here
                ]
            }
            { # This job will wait, since it needs the previous job's results. 
                'func':finalJob, 'previousJobArgs': True # it gets all the previous jobs results
            }
        ]}

Hopefully https://github.com/rq/rq/issues/260 is gets implemented, and my solution will be obsolete!

Unanimous answered 29/7, 2020 at 20:17 Comment(0)

Recommended topics

Hot tags