Compulsory usage of if __name__=="__main__" in windows while using multiprocessing [duplicate]
Asked Answered
B

2

35

While using multiprocessing in python on windows, it is expected to protect the entry point of the program. The documentation says "Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process)". Can anyone explain what exactly does this mean ?

Bolding answered 3/12, 2013 at 20:9 Comment(2)
This has been already asked and answered here several times...Suspense
I found the explanations here clearer than the ones at the linked previous question.Quagga
F
41

Expanding a bit on the good answer you already got, it helps if you understand what Linux-y systems do. They spawn new processes using fork(), which has two good consequences:

  1. All data structures existing in the main program are visible to the child processes. They actually work on copies of the data.
  2. The child processes start executing at the instruction immediately following the fork() in the main program - so any module-level code already executed in the module will not be executed again.

fork() isn't possible in Windows, so on Windows each module is imported anew by each child process. So:

  1. On Windows, no data structures existing in the main program are visible to the child processes; and,
  2. All module-level code is executed in each child process.

So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by __name__ == '__main__'. For a subtler example, consider code that builds a gigantic list, which you intend to pass out to worker processes to crawl over. You probably want to protect that too, because there's no point in this case to make each worker process waste RAM and time building their own useless copies of the gigantic list.

Note that it's a Good Idea to use __name__ == "__main__" appropriately even on Linux-y systems, because it makes the intended division of work clearer. Parallel programs can be confusing - every little bit helps ;-)

Floatable answered 3/12, 2013 at 20:29 Comment(1)
Great answer. Also see docs here and here.Gemmiparous
C
31

The multiprocessing module works by creating new Python processes that will import your module. If you did not add __name__== '__main__' protection then you would enter a never ending loop of new process creation. It goes like this:

  • Your module is imported and executes code during the import that cause multiprocessing to spawn 4 new processes.
  • Those 4 new processes in turn import the module and executes code during the import that cause multiprocessing to spawn 16 new processes.
  • Those 16 new processes in turn import the module and executes code during the import that cause multiprocessing to spawn 64 new processes.
  • Well, hopefully you get the picture.

So the idea is that you make sure that the process spawning only happens once. And that is achieved most easily with the idiom of the __name__== '__main__' protection.

Clown answered 3/12, 2013 at 20:16 Comment(6)
@PiotrDobrogost In that case you should vote to close the question as a duplicate. Is my answer incorrect or unhelpful in any way?Clown
Duplicating answers over and over again is unhelpful compared to linking to already existing ones. If you have something to add to already existing answers then you are free to answer already existing question.Suspense
@Piotr So help then. Find a duplicate and cast your close vote. If I agree, I'll vote too. Instead of complaining take some positive action. Why did you pick on me?Clown
@PiotrDobrogost You misunderstand. I did not create a duplicate. I just wrote an answer. The duplicate is the question. Perhaps you should be addressing your comments to the asker.Clown
@PiotrDobrogost Feel free to use your energy to submit a close vote.Clown
Thank you very much for this explanation!Vierra

© 2022 - 2024 — McMap. All rights reserved.