Python multiprocessing's Pool process limit
Asked Answered
C

4

57

In using the Pool object from the multiprocessing module, is the number of processes limited by the number of CPU cores? E.g. if I have 4 cores, even if I create a Pool with 8 processes, only 4 will be running at one time?

Cretic answered 18/11, 2013 at 3:37 Comment(2)
There is no limit as to how many you can create; but starting at a certain number the effectiveness of using multiple processes will turn into a penality. How big that number is depends a lot on what the threads will do; if there is only the processor at work, then you shouldn’t start more processes than your processor supports threads (e.g. i7 4-core processors have 8 threads).Sapp
The limit of linux (user/system) process is defined in /etc/security/limits.conf.Dashtilut
V
62

You can ask for as many processes as you like. Any limit that may exist will be imposed by your operating system, not by multiprocessing. For example,

 p = multiprocessing.Pool(1000000)

is likely to suffer an ugly death on any machine. I'm trying it on my box as I type this, and the OS is grinding my disk to dust swapping out RAM madly - finally killed it after it had created about 3000 processes ;-)

As to how many will run "at one time", Python has no say in that. It depends on:

  1. How many your hardware is capable of running simultaneously; and,
  2. How your operating system decides to give hardware resources to all the processes on your machine currently running.

For CPU-bound tasks, it doesn't make sense to create more Pool processes than you have cores to run them on. If you're trying to use your machine for other things too, then you should create fewer processes than cores.

For I/O-bound tasks, it may make sense to create a quite a few more Pool processes than cores, since the processes will probably spend most their time blocked (waiting for I/O to complete).

Vendetta answered 18/11, 2013 at 4:4 Comment(3)
Keep in mind that the processes result from os.fork, and so will involve copies of the parent process's memory footprint. This might be copy-on-write in some operating systems, but in general the circumstance of wanting to utilize a large number of 'units' that each perform a task that can be asycnhronously handled to avoid blocking, threads are often a better 'unit' of asychrony than processes (but not always). And even with GIL, threads that perform GIL-releasing operations can benefit from this.Rubenrubens
using more processes can cause memory leaks? multiprocessing.cpu_count() returns 16 on my system, I am using 8 (cpu_count //2).Revers
Your operating system requires RAM for every process you create. That's not "a leak": when you create a process, you're forcing the OS to eat that RAM. 8 processes should be close to trivial on any modern box.Vendetta
L
39

Yes. Theoretically there is no limit on processes you can create, but an insane amount of processes started at once will cause death to the system because of the running out of memory. Note that processes occupy a much larger footprint than threads as they don't use shared space between them but use an individual space for each process.

so the best programming practice is to use semaphore restricted to the number of processors of your system. likely

pool = multiprocessing.Semaphore(4) # no of cpus of your system.

If you are not aware of the number of cores of your system or if you want to use the code in many systems, a generic code like the below will do...

pool = multiprocessing.Semaphore(multiprocessing.cpu_count()) 
#this will detect the number of cores in your system and creates a semaphore with that  value.  

P.S. But it is good to use number of cores-1 always.

Hope this helps :)

Luby answered 18/11, 2013 at 4:15 Comment(0)
F
17

While there is no limit you can set, if you are looking to understand a convenient number to use for CPU bound processes (which I suspect you are looking for here), you can run the following:

>>> import multiprocessing
>>> multiprocessing.cpu_count()
1

Some good notes on limitations (especially in linux) are noted in the answer here:

Forwent answered 26/8, 2016 at 20:23 Comment(0)
D
11

That is correct. If you have 4 cores then 4 processes can be running at once. Remember that you have system stuff that needs to go on, and it would be nice for you to define the process number to be number_of_cores - 1. This is a preference and not mandatory. For each process that you create there is overhead, so you are actually using more memory to do this. But if RAM isn't a problem then go for it. If you are running Cuda or some other GPU based library then you have a different paradigm, but that's for another question.

Dashtilut answered 18/11, 2013 at 3:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.