Laravel Horizon Restrictions and Optimization
Asked Answered
N

1

6

Is there any rule of thumb or any logical relation between maxProcesses, number of supervisors and the total number of queues in laravel horizon?

What if I have 15 supervisors and 40 queues (each supervisor has multiple queues based on their category)? What is the maximum number of maxProcesses I can assign to each supervisor (suppose balancing auto)?
I want to know that if there's a rule of thumb for a better performance on horizon by tuning these numbers, for example if the number of supervisor-x should not exceed the total number of queues and if the maxProcesses should not exceed a certain number based on the OS spec running the processes.

Is there any logical relation between these numbers? Is there a good document about this issue? I have seen this document on supervisor and also the Laravel Horizon docs, but have not found the answer to my questions.

Needful answered 27/1, 2023 at 19:21 Comment(7)
There is. First of all, they'r all related. The number of queues are those defined in 'queue' => ['default']. These also make the total amount of processes all queues combined can start. More processes will kick in serving the amount of jobs available in all queues. The total amount of supervisors forks are those defined in environments, default in horizon it's supervisor-1. Here you can repeat the process above for all other queues you need for different environments, e.g. a special queue for long running jobs will need different settings apart from supervisor-1.Weiss
The performance has nothing to do with supervisor, but more or less the jobs you are about to run. Supervisor just does what it does, supervising, it doesn't do any real work. 40 queues is a lot, why would you have 40 queues. If you mean 40 processes serving 2 queues then that would make more sense. Think of a queue more like a line of people at a mall lining up to checkout. Different checkouts have different purposes. One checkout you can only pay cash, another only with a card. Another only with a max of 5 items, another items over 120 Kilograms, etc.Weiss
It's actually a legacy code written by two other people that I am maintainging it and developing new features on it, with those specifications I said, very hard to change anything on the code, everything is tightly coupled and I need to tune the horizon for stability of the project and less need to run commands manually every now and then. Then we will have time to refactor the queues and reduce them. It's actually a big project but I also believe there's no need for all these separated queues. I was wondering what happens if I increase maxProcesses for each supervisor to 30 for example?Needful
Ok, see it like this. A single supervisor instance is the main worker, e.g. supervisor-1 in horizons default example. The amount of queues is irrelevant. The total amount of jobs each queue produces are the total amount of processes each supervisor instance can spawn (fork). So if you have 40 queues, and each queue would generate a total of 120 jobs (each job takes 10 seconds), and the maximum amount of processes are 20, then 20 jobs are handled every 10 seconds. It will take 1 minute to process all jobs. If you scale maxProcesses to 120, it would take 10 seconds, that's the difference.Weiss
To your question, what is the maximum amount of processes (workers) I can assign, well that really depends on how much memory and CPU power you have available to compute all these processes (workers) simultaneously. To give you an indication on what scale I've used queues with redis. One of the biggest projects I've build was running about 16 different queues with 4 different supervisor configurations, running a total of 800 workers to a maximum of 3200 (workers) at peak load, simultaneously. In idle state it would have around 256 workers (2x 16 core, 256GBMem, raid SSD).Weiss
So is it safe to increase the maxProcesses of a supervisor which has a heavy queue size due to a big csv import? Currently the load is light on other supervisors (and queues), and I want to see if it's safe to increase the maxProcesses of that specific supervisor to a much bigger number than it is. After all, I want to understand all those calculations you make, what's the formula for it, in order to be confident on changing those numbers. Do you know any resource for me to study about them? The relation between cpu, ram, number of workers, and things like that. Thanks in advance.Needful
Yea well then I need to answer, because these comments are limited to about 500 chars :D. Ok give my 5 minutes.Weiss
W
6

I need to explain things in detail in order to understand the relation between all these things.

Supervisor exists out of some simple settings. The most important once are these:

[program:laravel-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /home/forge/app.com/artisan queue:work
autostart=true
autorestart=true
numprocs=8

The most important setting here is numprocs=8, from supervisor the manual it says:

Supervisor will start as many instances of this program as named by numprocs. Note that if numprocs > 1, the process_name expression must include %(process_num)s (or any other valid Python string expression that includes process_num) within it.

This configuration of supervisor running a program called artisan queue:work will create 8 instances (processes, workers, the same thing) of artisan queue:work. This means that 8 jobs can be processed simultaneously, nothing more, nothing less.

Horizon doesn't define the numprocs, the only important setting you'll have to know is the stopwaitsecs=3600. This should always be far greater than the maximum time a job runs in your entire application. Here the absolute maximum amount would be 60 minutes.

Now Horizon comes with a balancing strategy where you can define the min and max number of processes (workers) and it's strategy using

'balance' => 'auto',
'minProcesses' => 1,
'maxProcesses' => 10,

What Horizon offers to do here is scale up or down the amount of processes (workers) according to the amount of workload present in the queue(s).

If you define a supervisor configuration like the following:

'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default', 'events', 'xls', 'whatever'],
            'balance' => 'auto',
            'minProcesses' => 10,
            'maxProcesses' => 40,
            'balanceMaxShift' => 1,
            'balanceCooldown' => 3,
            'tries' => 3,
        ],
    ],
],

Then all 4 queues, default, events, xls and whatever run all under the same conditions, will have a total of 40 workers available and a minimum of 10. So not each queue has 40 workers available, but all combined have 40 workers (processes) available.

The key point here of getting a good scale for each queue to work optimally, is to divide them into different categories, e.g.

  • short-load -> each job takes about 1 to 5 seconds.
  • medium-load -> each job takes about 5 to 30 seconds.
  • long-load -> each job takes up to 5 minutes.
  • extreme-load -> each job takes longer than 5 minutes, up to an hour.

If you only end up with two scenarios, like short-load and long-load, then you will have two configurations for horizon in such a way which would define how fast supervisor will respond to spawn new workers and how many times it will try to repeat a job if it has failed (where you seriously don't want to try a job that will fail each time after 59 minutes 3 times).

'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default', 'events'],
            'balance' => 'auto',
            'minProcesses' => 10,
            'maxProcesses' => 40,
            'balanceMaxShift' => 10,
            'balanceCooldown' => 1,
            'tries' => 3,
        ],
        'supervisor-long-run' => [
            'connection' => 'redis',
            'queue' => ['xls', 'whatever'],
            'balance' => 'auto',
            'minProcesses' => 1,
            'maxProcesses' => 10,
            'balanceMaxShift' => 1,
            'balanceCooldown' => 3,
            'tries' => 1,
        ],
    ],
],

In one of your last comments you asked

I want to understand all those calculations you make, what's the formula for it

The formula is, 1 supervisor instance can have many queues, and all of these queues have a maximum amount of workers available. The queues are not that important, but the amount of jobs (and the kind of jobs) placed in these queues in a certain amount of time is.

Example:

4 queues producing 120 jobs each minute, need x amount of workers to be processed. If you scale up (or down) the amount of workers (processes), the amount of time it takes to process all these jobs until the queues are empty relates to the amount of workers you make available.

  • If you have 10 workers available, then 10 jobs will be processed simultaneously.
  • If you have 120 workers available, then 120 jobs will be processed simultaneously.

If 1 job takes 10 seconds to complete (as an example average) and an average of 120 jobs are put on a queue each minute. If you would like to process (clear the queue) all jobs within one minute, you need 120 jobs * 10 seconds for each job / 60 seconds in a minute = the amount of workers (processes) needed to complete all those jobs within 1 minute.

So yes, you can scale up the amount of workers to 64, 512 or 24890. It comes all back to the question how much load can your hardware handle.

Hope it made sense.

I'll clean up the text tomorrow using only workers, processes or instances .. it's such a mess ;)

Weiss answered 28/1, 2023 at 0:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.