Does it make sense to run more PHP-FPM children than number of CPU cores?
Asked Answered
U

2

8

Assuming that I have a CPU with 4 cores and 4 threads, does it make sense to run e.g. 8 PHP-FPM workers by setting pm.max_children = 8 option? As far as I'm concerned, CPU with 4 threads can only run up to 4 processes in "real" parallel. Wouldn't it cause an overhead if CPU time was lost due to contexts switching between these 8 processes?

In contrast, Node.js cluster mode documentation recommends to run up to as many workers/children as number of cores. Doesn't the same recommendation apply here?

Unaccomplished answered 10/4, 2022 at 9:22 Comment(0)
S
7

The general answer is yes, because although you can't run that many threads in parallel you can run them concurrently.

The key thing to understand is that in most real applications, a lot of the time spent processing a request is not spent using the local CPU - it's spent waiting for database queries, external APIs, even disk access. If you have one thread per CPU core, the CPU is simply sitting idle all that time. Allow additional threads, and one can be using the CPU while another is waiting for external data.

Only if your application is highly unusual and spending 100% of its time using the CPU would limiting to one thread per core make sense.

The reason this doesn't apply to node.js is that it implements concurrency within a single thread using asynchronous code: you can tell the current thread "start doing this, and while waiting for the result, get on with processing a different request". This isn't possible with native PHP, which uses a "shared nothing" approach - each request gets its own thread or process - but there are projects such as Swoole and Amp that add support for this asynchronous approach.

Superior answered 10/4, 2022 at 10:19 Comment(2)
Thank you. It seems much more clear now, especially the PHP-FPM part. However, I am still not sure what would happen if I run, let's say, 2 Node.js processes per core. It is mentioned almost everywhere that it could decrease performance as processes would have to "fight" for CPU time. How would that be different from running multiple PHP-FPM workers per core, i.e. don't PHP-FPM processes have to "fight" for CPU time while being attached to the same core?Unaccomplished
@Unaccomplished If the thread is sitting idle waiting for your database to return some results, it doesn't need to "fight" for anything; that will happen a lot of the time with PHP-FPM, but basically never in node.js, because the node.js thread will pick up another request while it's waiting, and carry on making use of the CPU. Also, "fight" maybe gives the wrong impression, because they're not actively trying to replace each other; it's more like "repeatedly be told to wait in a queue while the CPU tries to spend its time fairly".Superior
E
16

Yes it makes sense and you should probably always do so, let me explain why.

PHP does not use threading and runs on a single core. PHP-FPM spawns up many workers so you can run your processes on multiple cores.

It's important to understand how the OS uses process context switching to handle multiple processes concurrently. If you only have one core you are still able to run multiple processes on your machine at the same time, the reason for this is process context switching. This means the OS will use the single core and switch between process on the fly, processing every process a bit at a time depending on various factors, such if a process is waiting for some I/O, how long the process has been running, if another process as a higher priority, etc. The important part is process context switching takes some time, and the single core is shared among multiple processes.

If you have multiple cores, processes can be executed in parallel on every core, however most likely you still have more running processes than cores, so process context switching still happens, just at a lower rate.

The reason it's recommend to set the pm.max_children to a value higher than your CPU cores, is that in most cases your php process is not doing intensive CPU tasks, but mostly waiting for I/O, such as waiting for a SQL result, waiting for some curl response or some disk read write response. These operations are called I/O blocking and is usually what consumes most of the time in a request. By setting the pm.max_children to a higher value than cores (sometimes even 5-10 times the amount of cores), you can benefit from the context switching your OS will do while the process is in a blocking/idle state.

It's very possible to have over 20 PHP processes running just waiting for IO. If you were to set the pm.max_children to the numbers of cores, let's say 8, the cores might not be doing much and a lot of requests would be pilling up and response rate would be very slow.

If you are certain that your php processes have no blocking I/O and are only performing some calculations for example, you might actually benefit more from only setting exactly as many pm.max_children as your cores, the reason being that process context switching slows things down and having more running processes uses up more resources. However this scenario is unusual and most likely your processes do have I/O blocking and idle time.

There is a good article that goes in depth about process context switching on Linux here.

There is also something called coroutines that is used in the swoole PHP extension. Coroutines also use context switching to perform concurrency, however this is done programmatically which consumes much fewer resources and is much faster than OS context switching. If using swoole, there is no need for php-fpm since it's faster, but it has its other issues you need to care about. With swoole however it's recommend you set as many workers as cores to avoid OS context switching. You can have thousands of coroutines without affecting performance much.

Nodejs uses an event loop which is similar to swoole's coroutines. The reason it's recommend to set workers to match your cores is to avoid OS context switching and use the built in context switching since it's much faster and lighter.

Ectomere answered 29/5, 2022 at 12:47 Comment(0)
S
7

The general answer is yes, because although you can't run that many threads in parallel you can run them concurrently.

The key thing to understand is that in most real applications, a lot of the time spent processing a request is not spent using the local CPU - it's spent waiting for database queries, external APIs, even disk access. If you have one thread per CPU core, the CPU is simply sitting idle all that time. Allow additional threads, and one can be using the CPU while another is waiting for external data.

Only if your application is highly unusual and spending 100% of its time using the CPU would limiting to one thread per core make sense.

The reason this doesn't apply to node.js is that it implements concurrency within a single thread using asynchronous code: you can tell the current thread "start doing this, and while waiting for the result, get on with processing a different request". This isn't possible with native PHP, which uses a "shared nothing" approach - each request gets its own thread or process - but there are projects such as Swoole and Amp that add support for this asynchronous approach.

Superior answered 10/4, 2022 at 10:19 Comment(2)
Thank you. It seems much more clear now, especially the PHP-FPM part. However, I am still not sure what would happen if I run, let's say, 2 Node.js processes per core. It is mentioned almost everywhere that it could decrease performance as processes would have to "fight" for CPU time. How would that be different from running multiple PHP-FPM workers per core, i.e. don't PHP-FPM processes have to "fight" for CPU time while being attached to the same core?Unaccomplished
@Unaccomplished If the thread is sitting idle waiting for your database to return some results, it doesn't need to "fight" for anything; that will happen a lot of the time with PHP-FPM, but basically never in node.js, because the node.js thread will pick up another request while it's waiting, and carry on making use of the CPU. Also, "fight" maybe gives the wrong impression, because they're not actively trying to replace each other; it's more like "repeatedly be told to wait in a queue while the CPU tries to spend its time fairly".Superior

© 2022 - 2024 — McMap. All rights reserved.