We use Spring to run scheduled tasks which works fine with single node. We want to run these scheduled tasks in cluster of N nodes such that the tasks are executed atmost by one node at a point of time. This is for enterprise use case and we may expect upto 10 to 20 nodes.
I looked into various options:
- Use Quartz which seems to be a popular choice for running scheduled tasks in a cluster. Drawback: Database dependency which I want to avoid.
- Use zookeeper and always run the scheduled tasks only on the leader/master node. Drawback: Task execution load is not distributed
- Use zookeeper and have the scheduled tasks invoke on all nodes. But before the task runs acquire distributed lock and release once execution is complete. Drawback: The system clock on all nodes should be in sync which may be an issue if application is overloaded causing system clock drift.
- Use zookeeper and let the master node keep producing the task as per the schedule and assign it to a random worker. A new task is not assigned if previous scheduled task has not been worked on yet. Drawback: This appears to add too much complexity.
I am inclining towards using #3 which appears to be a safe solution assuming the zookeeper ensemble nodes run on a separate cluster with system clock in sync using NTP. This is also on assumption that if system clocks are in sync, then all nodes have equal chance of acquiring the lock to execute a task.
EDIT: After some more thought I realize this may not be a safe solution either since the system clock should be in sync between the nodes where the scheduled tasks are running not just the zookeeper cluster nodes. I am saying not safe because the nodes where the tasks are running can be overloaded with GC pauses and other reasons and there is possibility of clocks going out of sync. But again I would think this is a standard problem with distributed systems.
Could you please advise if my understanding on each of the options is accurate? Or may be there is a better approach than the listed options to solved this problem.