How to know when a set of RabbitMQ tasks are complete?

Asked 12/10, 2011 at 2:23 Answered 9/6, 2019 at 10:57

I am using RabbitMQ to have worker processes encode video files. I would like to know when all of the files are complete - that is, when all of the worker processes have finished.

The only way I can think to do this is by using a database. When a video finishes encoding:

UPDATE videos SET status = 'complete' WHERE filename = 'foo.wmv'
-- etc etc etc as each worker finishes --

And then to check whether or not all of the videos have been encoded:

SELECT count(*) FROM videos WHERE status != 'complete'

But if I'm going to do this, then I feel like I am losing the benefit of RabbitMQ as a mechanism for multiple distributed worker processes, since I still have to manually maintain a database queue.

Is there a standard mechanism for RabbitMQ dependencies? That is, a way to say "wait for these 5 tasks to finish, and once they are done, then kick off a new task?"

I don't want to have a parent process add these tasks to a queue and then "wait" for each of them to return a "completed" status. Then I have to maintain a separate process for each group of videos, at which point I've lost the advantage of decoupled worker processes as compared to a single ThreadPool concept.

Am I asking for something which is impossible? Or, are there standard widely-adopted solutions to manage the overall state of tasks in a queue that I have missed?

Edit: after searching, I found this similar question: Getting result of a long running task with RabbitMQ

Are there any particular thoughts that people have about this?

Rivy answered 12/10, 2011 at 2:23 Comment(0)

Use a "response" queue. I don't know any specifics about RabbitMQ, so this is general:

Have your parent process send out requests and keep track of how many it sent
Make the parent process also wait on a specific response queue (that the children know about)
Whenever a child finishes something (or can't finish for some reason), send a message to the response queue
Whenever numSent == numResponded, you're done

Something to keep in mind is a timeout -- What happens if a child process dies? You have to do slightly more work, but basically:

With every sent message, include some sort of ID, and add that ID and the current time to a hash table.
For every response, remove that ID from the hash table
Periodically walk the hash table and remove anything that has timed out

This is called the Request Reply Pattern.

Fender answered 12/10, 2011 at 3:49 Comment(1)

Thumbs up for referring to the pattern name. – Snooty 7/10, 2013 at 14:51

Based on Brendan's extremely helpful answer, which should be accepted, I knocked up this quick diagram which be helpful to some.

University answered 9/6, 2019 at 10:57 Comment(0)

I have implemented a workflow where the workflow state machine is implemented as a series of queues. A worker receives a message on one queue, processes the work, and then publishes the same message onto another queue. Then another type of worker process picks up that message, etc.

In your case, it sounds like you need to implement one of the patterns from Enterprise Integration Patterns (that is a free online book) and have a simple worker that collects messages until a set of work is done, and then processes a single message to a queue representing the next step in the workflow.

Spillage answered 29/1, 2012 at 6:47 Comment(0)

Recommended topics

Hot tags