How to manage/balance semi persistent jobs over service instances
Asked Answered
I

3

17

I see a common pattern for services that we try to develop and I wonder if there are tools / libraries out there that would help here. While the default jobs as discussed in microservice literature is from the REQUEST -> RESPONSE nature, our jobs are more or less assignments of semi permanent tasks.

Examples of such tasks

  • Listen on the message queue for data from source X and Y, correlate the data that comes in and store it in Z.
  • Keep an in-memory buffer that calculates a running average of the past 15 mins of data everytime a new data entry comes in.

Currently our services are written in PHP. Due to the perceived overhead of PHP processes and connections to the message queue we'd like a single service process to handle multiple of those jobs simultanously.

A chart that hopefully illustrated the setup that we have in our head: services_setup

  • Service Workers are currently deamonized PHP scripts
  • For the Service Registry we are looking at Zookeeper

While Zookeeper (and Curator) do loadbalancing, I did not find anything around distributing permanent jobs (that are updatable, removable, and must be reassigned when a worker dies)

Proposed responsibilities of a Job Manager

  • Knows about jobs
  • Knows about services that can do these jobs
  • Can assign jobs to services
  • Can send job updates to services
  • Can reassign jobs if a worker dies

Are there any libraries / tools that can tackle such problems, and can thus function as the Job Manager? Or is this all one big anti pattern and should we do it some other way?

Impecunious answered 10/7, 2015 at 13:54 Comment(0)
A
10

You should have a look at Gearman.

It composes of a client which assigns the jobs, one or more workers which will pick up and execute the jobs and a server which will maintain the list of functions (services) and jobs pending. It will re-assign the jobs if a worker dies.

Arcograph answered 17/7, 2015 at 11:59 Comment(2)
Yes, a lot of the functionality I want is already in a message broker (like gearman). I will try it with RabbitMQ, using messages as jobs and letting the worker instances fetch as much as they want, but without acking them. (Lets say 20 jobs per worker). I can then use a flag in memcache or such to indicate if a certain job is updated/cancelled, in which case the worker can finish (ACK) the job and any replacement can be posted on the queue as a new job.Impecunious
Sounds good. You might also want to check Celery - it directly consumes RabbitMQ, though I'm not sure if they have support for PHP. I found a list of other alternatives that might be interesting: queues.ioArcograph
P
5

Your workers sound like (api-less) services itself. So, your requirements can be reformulated as:

  • Knows about deployed services
  • Knows about nodes that can host there services
  • Can deploy services to nodes
  • Can [send job updates to services] = redeploy services/invoke some API on deployed services
  • Can redeploy service if service or node dies

Look at Docker to deploy, run and manage isolated processes on host.

Philpott answered 17/7, 2015 at 11:5 Comment(0)
F
0

RabbitMq is simple message queue that is fairly easy to get going with.

Footwall answered 22/7, 2015 at 3:8 Comment(2)
Um that's odd why did I get +100 for this. Just saying. I had planed to do a more though explanation but then I saw the OP's comment about I will try it with RabbitMQ and figured why bother they already figured it out.Footwall
The other attempted answers are nowhere near to be a good answer for this question. I honestly will spend all my SO points just to see a good answer for this question. The number of solutions proposed for this issue is so large that a decent survey would make sense and would help more people interested in this type of solutions.Phycomycete

© 2022 - 2024 — McMap. All rights reserved.