Why does multiprocessing use only a single core after I import numpy?

Asked 26/3, 2013 at 14:37 Answered 12/7, 2015 at 17:56

Solved python linux numpy multiprocessing blas

140

I am not sure whether this counts more as an OS issue, but I thought I would ask here in case anyone has some insight from the Python end of things.

I've been trying to parallelise a CPU-heavy for loop using joblib, but I find that instead of each worker process being assigned to a different core, I end up with all of them being assigned to the same core and no performance gain.

Here's a very trivial example...

from joblib import Parallel,delayed
import numpy as np

def testfunc(data):
    # some very boneheaded CPU work
    for nn in xrange(1000):
        for ii in data[0,:]:
            for jj in data[1,:]:
                ii*jj

def run(niter=10):
    data = (np.random.randn(2,100) for ii in xrange(niter))
    pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
    results = pool(delayed(testfunc)(dd) for dd in data)

if __name__ == '__main__':
    run()

...and here's what I see in htop while this script is running:

htop

I'm running Ubuntu 12.10 (3.5.0-26) on a laptop with 4 cores. Clearly joblib.Parallel is spawning separate processes for the different workers, but is there any way that I can make these processes execute on different cores?

Hendershot answered 26/3, 2013 at 14:37 Comment(5)

#15168514 - no answers there I am afraid, but it sounds like the same issue. – Quarrelsome 26/3, 2013 at 14:44

Also #6905764 – Quarrelsome 26/3, 2013 at 14:45

And #12592518 – Quarrelsome 26/3, 2013 at 14:46

Is this still an issue? I'm attempting to recreate this with Python 3.7 and importing numpy with multiprocessing.Pool(), and it's using all the threads (as it should). Just want to ensure that this has been fixed. – Betsybetta 6/12, 2019 at 22:0

Is this the same issue? joblib.readthedocs.io/en/latest/… "Some third-party libraries – e.g. the BLAS runtime used by numpy – internally manage a thread-pool to perform their computations. … joblib tells supported third-party libraries to use a limited number of threads in workers managed by the 'loky' backend … Since joblib 0.14, it is also possible to programmatically override the default number of threads using the inner_max_num_threads argument of the parallel_backend function " – Tera 10/3, 2021 at 4:25

169

After some more googling I found the answer here.

It turns out that certain Python modules (numpy, scipy, tables, pandas, skimage...) mess with core affinity on import. As far as I can tell, this problem seems to be specifically caused by them linking against multithreaded OpenBLAS libraries.

A workaround is to reset the task affinity using

os.system("taskset -p 0xff %d" % os.getpid())

With this line pasted in after the module imports, my example now runs on all cores:

htop_workaround

My experience so far has been that this doesn't seem to have any negative effect on numpy's performance, although this is probably machine- and task-specific .

Update:

There are also two ways to disable the CPU affinity-resetting behaviour of OpenBLAS itself. At run-time you can use the environment variable OPENBLAS_MAIN_FREE (or GOTOBLAS_MAIN_FREE), for example

OPENBLAS_MAIN_FREE=1 python myscript.py

Or alternatively, if you're compiling OpenBLAS from source you can permanently disable it at build-time by editing the Makefile.rule to contain the line

NO_AFFINITY=1

Hendershot answered 26/3, 2013 at 15:36 Comment(10)

Thank, your solution solved the problem. One question, I have the same code but run differently on tow different machine. Both machines are Ubuntu 12.04 LTS, python 2.7, but only one have this issue. Do you have any idea why? – Prague 8/10, 2013 at 2:43

Both machines have OpenBLAS (build with OpenMPI). – Prague 8/10, 2013 at 20:14

Old thread, but in case anyone else finds this issue, I had the exact problem and it was indeed related to the OpenBLAS libraries. See here for two possible workarounds and some related discussion. – Naomanaomi 5/2, 2016 at 14:5

Another way to set cpu affinity is to use psutil. – Vamoose 9/2, 2016 at 10:15

I see this is for Python 2.7, is this fixed in Python 3.4? – Lulululuabourg 3/5, 2016 at 12:52

@JHG It's an issue with OpenBLAS rather than Python, so I can't see any reason why the Python version would make a difference – Hendershot 3/5, 2016 at 12:56

@Hendershot thank you, I see in Python issue version 2.7 then I need to be sure about it, thank you very much – Lulululuabourg 3/5, 2016 at 14:22

I had a similar issue, but I had to call os.system("taskset -p 0xff %d" % os.getpid()) inside the equivalent of testfunc to utilize more than a single core. – Evildoer 5/3, 2019 at 21:46

I think it's better to add flag a to apply affinity to all threads: os.system("taskset -ap 0xff %d" % os.getpid()) – Barbiebarbieri 1/6, 2020 at 16:59

How can we do this ( os.system("taskset -p 0xff %d" % os.getpid()) ) in C++? – Rothschild 19/8, 2020 at 12:58

Python 3 now exposes the methods to directly set the affinity

>>> import os
>>> os.sched_getaffinity(0)
{0, 1, 2, 3}
>>> os.sched_setaffinity(0, {1, 3})
>>> os.sched_getaffinity(0)
{1, 3}
>>> x = {i for i in range(10)}
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> os.sched_setaffinity(0, x)
>>> os.sched_getaffinity(0)
{0, 1, 2, 3}

Aloeswood answered 12/7, 2015 at 17:56 Comment(4)

Error > AttributeError: module 'os' has no attribute 'sched_getaffinity' , Python 3.6 – Mingrelian 9/3, 2017 at 13:38

@Paddy From the linked documentation: They are only available on some Unix platforms. – Pitchblack 4/5, 2017 at 17:34

I have same problem but I have integrate this same line at the top os.system("taskset -p 0xff %d" % os.getpid()) but its not uses all cpu – Pettifogging 22/8, 2017 at 13:4

I had the same problem on a cluster. Any python process run on a computing node would only use 1 core even though my code was in principle able to use more cores and even though I had requested ~20 cores. For me adding import os and os.sched_setaffinity(0,range(1000)) to my python code solved the problem. – Snowberry 20/11, 2020 at 8:43

This appears to be a common problem with Python on Ubuntu, and is not specific to joblib:

I would suggest experimenting with CPU affinity (taskset).

Quarrelsome answered 26/3, 2013 at 14:47 Comment(1)

Python on Ubuntu This implies it's working without trouble on Windows and other OS. Is it? – Selectman 10/10, 2016 at 18:39

Update:

Recommended topics

Hot tags