These two functions are very different and NUM_WORKERS = os.sched_getaffinity(0) - 1
would just instantly fail with TypeError
because you try to subtract an integer from a set. While os.cpu_count()
tells you how many cores the system has, os.sched_getaffinity(pid)
tells you on which cores a certain thread/process is allowed to run.
os.cpu_count()
os.cpu_count()
shows the number of available cores as known to the OS (virtual cores). Most likely you have half this number of physical cores. If it makes sense to use more processes than you have physical cores, or even more than virtual cores, depends very much on what you are doing. The tighter the computational loop (little diversity in instructions, few cache misses, ...), the more likely you won't benefit from more used cores (by using more worker-processes) or even experience performance degradation.
Obviously it also depends on what else your system is running, because your system tries to give every thread (as the actual execution unit of a process) in the system a fair share of run-time on the available cores. So there is no generalization possible in terms of how many workers you should use. But if, for instance, you have a tight loop and your system is idling, a good starting point for optimizing is
os.cpu_count() // 2 # same as mp.cpu_count() // 2
...and increasing from there.
How @Frank Yellin already mentioned, multiprocessing.Pool
uses os.cpu_count()
for the number of workers as a default.
os.sched_getaffinity(pid)
os.sched_getaffinity(pid)
Return the set of CPUs the process with PID pid (or the current
process if zero) is restricted to.
Now core/cpu/processor/-affinity is about on which concrete (virtual) cores your thread (within your worker-process) is allowed to run. Your OS gives every core an id, from 0 to (number-of-cores - 1) and changing affinity allows restricting ("pinning") on which actual core(s) a certain thread is allowed to run at all.
At least on Linux I found this to mean that if none of the allowed cores is currently available, the thread of a child-process won't run, even if other, non-allowed cores would be idle. So "affinity" is a bit misleading here.
The goal when fiddling with affinity is to minimize cache invalidations from context-switches and core-migrations. Your OS here usually has the better insight and already tries to keep caches "hot" with its scheduling-policy, so unless you know what you're doing, you can't expect easy gains from interfering.
By default the affinity is set to all cores and for multiprocessing.Pool
, it doesn't make too much sense bothering with changing that, at least if your system is idle otherwise.
Note that despite the fact the docs here speak of "process", setting affinity really is a per-thread thing. So for example, setting affinity in a "child"-thread for the "current process if zero", does not change the affinity of the main-thread or other threads within the process. But, child-threads inherit their affinity from the main-thread and child-processes (through their main-thread) inherit affinity from the parent's process main-thread. This affects all possible start-methods ("spawn", "fork", "forkserver"). The example below demonstrates this and how to modify affinity with using multiprocessing.Pool
.
import multiprocessing as mp
import threading
import os
def _location():
return f"{mp.current_process().name} {threading.current_thread().name}"
def thread_foo():
print(f"{_location()}, affinity before change: {os.sched_getaffinity(0)}")
os.sched_setaffinity(0, {4})
print(f"{_location()}, affinity after change: {os.sched_getaffinity(0)}")
def foo(_, iterations=200e6):
print(f"{_location()}, affinity before thread_foo:"
f" {os.sched_getaffinity(0)}")
for _ in range(int(iterations)): # some dummy computation
pass
t = threading.Thread(target=thread_foo)
t.start()
t.join()
print(f"{_location()}, affinity before exit is unchanged: "
f"{os.sched_getaffinity(0)}")
return _
if __name__ == '__main__':
mp.set_start_method("spawn") # alternatives on Unix: "fork", "forkserver"
# for current process, exclude cores 0,1 from affinity-mask
print(f"parent affinity before change: {os.sched_getaffinity(0)}")
excluded_cores = {0, 1}
os.sched_setaffinity(0, os.sched_getaffinity(0).difference(excluded_cores))
print(f"parent affinity after change: {os.sched_getaffinity(0)}")
with mp.Pool(2) as pool:
pool.map(foo, range(5))
Output:
parent affinity before change: {0, 1, 2, 3, 4, 5, 6, 7}
parent affinity after change: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-1 MainThread, affinity before thread_foo: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 MainThread, affinity before thread_foo: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-1 Thread-1, affinity before change: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-1 Thread-1, affinity after change: {4}
SpawnPoolWorker-1 MainThread, affinity before exit is unchanged: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-1 MainThread, affinity before thread_foo: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 Thread-1, affinity before change: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 Thread-1, affinity after change: {4}
SpawnPoolWorker-2 MainThread, affinity before exit is unchanged: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 MainThread, affinity before thread_foo: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 Thread-2, affinity before change: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 Thread-2, affinity after change: {4}
SpawnPoolWorker-2 MainThread, affinity before exit is unchanged: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 MainThread, affinity before thread_foo: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-1 Thread-2, affinity before change: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-1 Thread-2, affinity after change: {4}
SpawnPoolWorker-1 MainThread, affinity before exit is unchanged: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 Thread-3, affinity before change: {2, 3, 4, 5, 6, 7}
SpawnPoolWorker-2 Thread-3, affinity after change: {4}
SpawnPoolWorker-2 MainThread, affinity before exit is unchanged: {2, 3, 4, 5, 6, 7}