I am running a Python script on a Windows HPC cluster. A function in the script uses starmap
from the multiprocessing
package to parallelize a certain computationally intensive process.
When I run the script on a single non-cluster machine, I obtain the expected speed boost. When I log into a node and run the script locally, I obtain the expected speed boost. However, when the job manager runs the script, the speed boost from multiprocessing
is either completely mitigated or, sometimes, even 2x slower. We have noticed that memory paging occurs when the starmap
function is called. We believe that this has something to do with the nature of Python's multiprocessing
, i.e. the fact that a separate Python interpreter is kicked off for each core.
Since we had success running from the console from a single node, we tried to run the script with HPC_CREATECONSOLE=True
, to no avail.
Is there some kind of setting within the job manager that we should use when running Python scripts that use multiprocessing
? Is multiprocessing
just not appropriate for an HPC cluster?