In a typical web application, there are some things that I would prefer to run as delayed jobs/tasks. They tend to have some or all of the following properties:
- Takes a long time (anywhere from multiple seconds to multiple minutes to multiple hours).
- Occupy some resource heavily (CPU, network, disk, external API limits, etc.)
- Result not immediately necessary. Can complete HTTP response without it. OK (and possibly even preferable) to delay until later.
- Can be (and possibly preferable to) run on (a) different machine(s) than web server(s). The machine(s) are potentially dedicated job/task runners.
- Should be run in response to other event(s), or started periodically.
What would be the preferred way(s) to set up, enqueue, schedule, and run delayed jobs/tasks in a Scala + Play Framework 2.x app?
For more details...
The pattern I have used in the past, and which I would like to replicate if applicable, is:
- In handler of web request, or in cron-like call, enqueue job(s)
- In job runner(s), repeatedly dequeue and run one job at a time
- Possibly handle recording job results
This seems to be a relatively simple yet still relatively flexible pattern.
Examples I have encountered in the past include:
- Updating derived data in DB
- Analytics/tracking API calls for a web request
- Delete expired sessions or other stale/outdated DB records
- Periodic batch ETLs
In other languages/frameworks, I would typically use a job/task framework. Examples include:
- Resque in a Ruby + Rails app
- Celery in a Python + Django app
I have found the following existing materials, but unfortunately, I don't think they fit my use case directly.
- Play 1.x asynchronous jobs API (+ various SO questions referencing it). Appears to have been removed in 2.x line. No reference to what replaced it.
- Play 2.x Akka integration. Seems very general-purpose. I'd imagine it's possible to use Akka for the above, but I'd prefer not to write a jobs/tasks framework if one already exists. Also, no info on how to separate the job runner machine(s) from your web server(s).
- This SO answer. Seems potentially promising for the "short to medium duration IO bound" case, e.g. analytics calls, but not necessarily for the "CPU bound" case (probably shouldn't tie up CPU on web server, prefer to ship off to different node), the "lots of network" case, or the "multiple hour" case (probably shouldn't leave that in the background on the web server, even if it isn't eating up too many resources).
- This SO question, and related questions. Similar to above, it seems to me that this covers only the cases where it would be appropriate to run on the same web server.
Some further clarification on use-cases (as per commenters' request). There are two main use-cases that I have experienced with something like resque or celery that I am trying to replicate here:
- Some event on the site (Most often, an incoming web request causes task to be enqueued.)
- Task should run periodically. (Most often, this is implemented as: periodically, enqueue task to be run as above.)
In the case of resque or celery, the tasks enqueued by both use-cases enter queues the same way and are treated the same way by the runner/worker process. Barring other Scala or Play-specific considerations, that would be my initial guess for how to approach this.
Some further clarification on why I do not believe the Akka scheduler fits my use case out-of-the-box (as per commenters' request):
While it is no doubt possible to construct a fitting solution using some combination of the Akka scheduler (for periodic jobs), akka-remote and akka-cluster (for communicating between the job caller and the job runner), that approach requires a certain amount of glue code which is almost a delayed job framework in and of itself. If it exists, I would prefer to use an existing out-of-the-box solution rather than reinvent the wheel.