Distributed video encoding - Gearman vs Beanstalkd
Asked Answered
S

1

8

Im looking to build a distributed video encoding cluster of a few dozen machines. Ive never worked with a messaging queue before, but the 2 that I started playing around with were Gearman and Beanstalkd.

Beanstalk seems to be a lot simpler and easier to use than Gearman, but its not as feature rich as.

One thing I don't understand is... how do you spawn new workers on all the servers? I plan to use php. Is it as simple as running worker.php in CLI with "&" and just have it sit there waiting for work?

I noticed gearman doesn't actually kill the process after a job is done, but Beanstalk does, so I have to restart the script after every job, on every server.

Currently Im more inclined to use Beanstalk, the general flow of things I planned was:

Run a minutely cron on each server that checks if there are pre-defined amount of workers running. If its less than supposed to be, spawn new worker processes. Each process will take roughly 2-30 minutes.

Maybe I have a flaw in my logic here? Let me know what would be a "better" or "proper" way of doing this?

Shaina answered 22/2, 2012 at 17:25 Comment(3)
Be sure to check out Pheanstalk: github.com/pda/pheanstalkEndoderm
Pardon my naivete, but with either of those two messaging frameworks, where does the "server" get installed? On the remote machine that will handle the off-loaded process, or the machine that informs the remote machine of a new job?Nell
Too expensive to use any of those services.Shaina
O
1

Terminology I will use just to try and be clear... There is the concept of a producer and a consumer. The producer generates jobs that are put on a queue (i.e. the beanstalk service) that is then read by a consumer.

There are multiple ways to write a consumer. You can either every x time frame via a cron job run the task or just have a consumer running in a while 1 loop via php (or what have you).

Where to install the service is really dependent on what you are going after. For me I normally install the service either on a consumer(s) or on its separate box (with sometimes the latter being overkill depending on your needs).

If you want durability on the queue side then you should use Beanstalk's binlog parameter (-b ). If something happens to your beanstalk service this will allow you to restart with minimal loss of data in the queues (if not no information). Durability on the producer side can come from having multiple queues to try against.

Orogeny answered 5/3, 2012 at 22:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.