Why does importing module in '__main__' not allow multiprocessig to use module?

import os import multiprocessing as mp def run(in_file, out_dir, out_q): arcpy.RaterToPolygon_conversion(in_file, out_dir, "NO_SIMPIFY", "Value") status = str("Done with "+os.path.basename(in_file)) out_q.put(status, block=False) if __name__ == '__main__': raw_input("Program may hang, press Enter to import ArcPy...") import arcpy q = mp.Queue() _file = path/to/file _dir = path/to/dir # There are actually lots of files in a loop to build # processes but I just do one for context here p = mp.Process(target=run, args=(_file, _dir, q)) p.start() # I do stuff with Queue below to status user

The situation is different in unix-like systems and Windows. On the unixy systems, multiprocessing uses fork to create child processes that share a copy-on-write view of the parent memory space. The child sees the imports from the parent, including anything the parent imported under if __name__ == "__main__":.

On windows, there is no fork, a new process has to be executed. But simply rerunning the parent process doesn't work - it would run the whole program again. Instead, multiprocessing runs its own python program that imports the parent main script and then pickles/unpickles a view of the parent object space that is, hopefully, sufficient for the child process.

That program is the __main__ for the child process and the __main__ of the parent script doesn't run. The main script was just imported like any other module. The reason is simple: running the parent __main__ would just run the full parent program again, which mp must avoid.

Here is a test to show what is going on. A main module called testmp.py and a second module test2.py that is imported by the first.

testmp.py

import os
import multiprocessing as mp

print("importing test2")
import test2

def worker():
    print('worker pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))

if __name__ == "__main__":
    print('main pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))
    print("running process")
    proc = mp.Process(target=worker)
    proc.start()
    proc.join()

test2.py

import os

print('test2 pid: {}, module name: {}, file name: {}'.format(os.getpid(),
        __name__, __file__))

When run on Linux, test2 is imported once and the worker runs in the main module.

importing test2
test2 pid: 17840, module name: test2, file name: /media/td/USB20FD/tmp/test2.py
main pid: 17840, module name: __main__, file name: testmp.py
running process
worker pid: 17841, module name: __main__, file name: testmp.py

Under windows, notice that "importing test2" is printed twice - testmp.py was run two times. But "main pid" was only printed once - its __main__ wasn't run. That's because multiprocessing changed the module name to __mp_main__ during import.

E:\tmp>py testmp.py
importing test2
test2 pid: 7536, module name: test2, file name: E:\tmp\test2.py
main pid: 7536, module name: __main__, file name: testmp.py
running process
importing test2
test2 pid: 7544, module name: test2, file name: E:\tmp\test2.py
worker pid: 7544, module name: __mp_main__, file name: E:\tmp\testmp.py

Recommended topics

Hot tags