I need a scheduler for large dynamic collections of tasks. At the moment I'm looking at resque-scheduler, rufus-scheduler, and clockwork. I'll be grateful for advice on choosing which one (or what alternative) to use.
Some details:
- There is a large collection of tasks (up to 100K) to be periodically executed.
- The shortest execution period is 1h.
- New tasks may appear from time to time. Existing tasks may be changed or deleted.
- Scheduling latency minimization is not mission-critical here (scalability and sustainability is most important).
- Tasks execution is not a heavy operation, and could be easily paralleled.
Summarizing, I need something like cron for Ruby project that can handle a large, dynamically changing collection of tasks.
Update: I've spent a day experimenting with scheduling libraries, and now I'd like to briefly summarize newly obtained experience.
I've stopped my attention at Clockwork and resque-scheduler libraries, due these are more mature projects with more detailed documentation. Resque-scheduler is based on rufus-scheduler while Clockwork is inspired by it, both can be used for the solution I'm looking for.
Both are standalone services supposed to be running in separate process, that can handle virtually unlimited amount of tasks scheduled for single or recurrent execution. Tasks are executed within threads.
Clockwork pros:
- It has an ability to load scheduled tasks from database (through ActiveRecord or any arbitrary source).
- Also it can dynamically update scheduled tasks by polling data updates from the DB.
Clockwork cons:
- DB polling is a potential bottleneck here.
- Polling interval is 1 minute (plus the time to reschedule all tasks), which is a bit too slow.
- Scheduled tasks addressing (to unschedule or change) is undocumented, that's why using this feature look like a hack to me.
I've implemented an alternative Manager class for Clockwork (this is a core part of the gem that controls scheduling) to allow scheduling control through ZeroMQ messages. So the main service in my project can send commands to the scheduler, like "run this each day", or "unschedule task #10", and the scheduler executes each request immediately.
I have less experience with resque-scheduler, but at this point it looks like a better solution.
resque-scheduler pros:
- Redis-based persistence. The manual asserts that scheduled tasks could be rescued after service restart.
- Dynamic scheduling with clean API. You just call
Resque.remove_schedule(name)
to drop a specific task. - Web UI. Not too important, but nice to have.
resque-scheduler:
- It requires Redis to be installed.
May be something else will appear, after closer look, but there is nothing else at the moment.
That is what I have now. BTW, I've published a number of links to the scheduling-related Ruby gems on GitHub.