Multiprocessing causes Python to crash and gives an error may have been in progress in another thread when fork() was called
Asked Answered
Z

5

187

I am relatively new to Python and trying to implement a Multiprocessing module for my for loop.

I have an array of Image url's stored in img_urls which I need to download and apply some Google vision.

if __name__ == '__main__':

    img_urls = [ALL_MY_Image_URLS]
    runAll(img_urls)
    print("--- %s seconds ---" % (time.time() - start_time)) 

This is my runAll() method

def runAll(img_urls):
    num_cores = multiprocessing.cpu_count()

    print("Image URLS  {}",len(img_urls))
    if len(img_urls) > 2:
        numberOfImages = 0
    else:
        numberOfImages = 1

    start_timeProcess = time.time()

    pool = multiprocessing.Pool()
    pool.map(annotate,img_urls)
    end_timeProcess = time.time()
    print('\n Time to complete ', end_timeProcess-start_timeProcess)

    print(full_matching_pages)


def annotate(img_path):
    file =  requests.get(img_path).content
    print("file is",file)
    """Returns web annotations given the path to an image."""
    print('Process Working under ',os.getpid())
    image = types.Image(content=file)
    web_detection = vision_client.web_detection(image=image).web_detection
    report(web_detection)

I am getting this as the warning when I run it and python crashes

objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
Zuzana answered 4/5, 2018 at 6:36 Comment(8)
Are you on OSX? Then perhaps this bug report gives you some hints.Volva
Oh Yeah I am on OSX, thank you for pointing me to the link.Zuzana
Still no luck tried setting the OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES as mentioned, still get the same error. @VolvaZuzana
Unfortunately, I have no specific knowledge on this topic. All I can do is use Google to find related issues, e.g. this possible workaround.Volva
This is due to Apple changing macOS fork() behavior since High Sierra. The OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes variable turns off the immediate crash behavior that their newer ObjectiveC framework usually enforces now by default. This can affect any language that is doing multithreading / multiprocessing using fork() on macOS >= 10.13, especially when "native extensions" / C code extensions are used.Epicanthus
There are also some Python specific issues w.r.t. multithreading & multiprocessing that you might want to be aware of. It's common to run into deadlock and performance issues with Python threads due to the way Python is designed, specifically the "GIL" / Global Interpreter LockEpicanthus
Another good discussion of the issue, thanks to Reddit user "snatchery"Epicanthus
Also a summary thanks to Reddit user "Nwallins"Epicanthus
D
413

This error occurs because of added security to restrict multithreading in macOS High Sierra and later versions of macOS. I know this answer is a bit late, but I solved the problem using the following method:

Set an environment variable .bash_profile (or .zshrc for recent macOS) to allow multithreading applications or scripts under the new macOS High Sierra security rules.

Open a terminal:

$ nano .bash_profile

Add the following line to the end of the file:

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Save, exit, close terminal and re-open the terminal. Check to see that the environment variable is now set:

$ env

You will see output similar to:

TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/pn/vasdlj3ojO#OOas4dasdffJq/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.E7qLFJDSo/Render
TERM_PROGRAM_VERSION=404
TERM_SESSION_ID=NONE
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

You should now be able to run your Python script with multithreading.

Drubbing answered 7/9, 2018 at 22:44 Comment(13)
This actually solved it for me. I wanted to iterate a large pandas dataframe across multiple threads, and ran into the same issue described by the op. This answer solved the issue for me. Only difference is that I set that env variable with the script I ran: OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES python my-script.pyBehoove
Thanks so much! For those interested, this worked for me on macOS Mojave.Physicality
This solved my issue, but my script was using multi-processingSchwarzwald
It worked on my machine with macOS Mojave, but then my pytest tests don't run in parallel anymore. Before, it was crashing, but at least, it was fast...Herrera
This environment variable solved my issue of running ansible locally on my mac (catalina)Koto
This worked for me on catalina after passing the env var to my virtual env (using tox) passenv = OBJC_DISABLE_INITIALIZE_FORK_SAFETYEpisodic
works for me, running on Catalina. Note that in Catalina (for my case at least), my profile was in ~/.zprofile and not ~/.bash_profileAntisocial
This also worked for me on MacOS Catalina by editing ~/.zshrc :)Amphiaster
It not worked on my machine, macOS 11.1, python3.8Babe
It doesn't work on my Ubuntu machine with python3.8Thorazine
This worked for me on MacOS Big SurDaye
I had to add 'export' in front of: OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YESCioban
Not works on python3.9 & MacOS VentureSateen
C
47

Running MAC and z-shell and in my .zshrc-file I had to add:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

and then in the command line:

source ~/.zshrc

Then it worked

Cioban answered 16/11, 2021 at 14:24 Comment(1)
You could also run the export command on the terminal you are working on, or in set it as an environment variable in case you use an IDE (ex pyCharm). On this case you will use this variable for specific projects only, not for every python tool running on your machine.Lyndialyndon
E
25

the other answers are telling you to set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES, but don't do this! you're just putting sticky tape on the warning light. You may need this on a case by case basis for some legacy software but certainly do not set this in your .bash_profile!

this is fixed in https://bugs.python.org/issue33725 (python3.8+) but it's best practice to use

with multiprocessing.get_context("spawn").Pool() as pool:
    pool.map(annotate,img_urls)
Estreat answered 1/10, 2021 at 11:36 Comment(1)
This is certainly a valid answer but readers should be aware that a lot of mulitprocessing code based on "fork" mode will not work with "spawn". "spawn" requires the code to be designed from the start to work this way. You need to be careful about shared objects, and you don't have the memory-state of the parent. Fork gives copy-on-write access to all variables which is really useful!Magma
V
8

The OBJC_DISABLE_INITIALIZE_FORK_SAFETY = YES solution didn't work for me. Another potential solution is setting no_proxy = * in your script environment as described here.

Besides the causes covered by others, this error message can also be networking related. My script has a tcp server. I don't even use a pool, just os.fork and multiprocessing.Queue for message passing. The forks worked fine until I added the queue.

Setting no_proxy by itself fixed it in my case. If your script has networking components, try this fix - perhaps in combination with OBJC_DISABLE_INITIALIZE_FORK_SAFETY.

Voidance answered 10/6, 2022 at 13:34 Comment(1)
I had one error before I added OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES, then a different one, and then it worked after I added no_proxy=*. So I think I needed both.Roturier
B
2

The solution that works for me without OBJC_DISABLE_INITIALIZE_FORK_SAFETY flag in the environment involves initializing the multiprocessing.Pool class right after the main() program starts.

This is most likely not the fastest solution possible and I am not sure if it works in all situations, however, pre-heating the worker processes early enough before my programs starts does not result in any ... may have been in progress in another thread when fork() was called errors and I do get a significant performance boost compared to what I get with non-parallelized code.

I have created a convenience class Parallelizer which I am starting very early and then using throughout the lifecycle of my program. The full version can be found here.

# entry point to my program
def main():
    parallelizer = Parallelizer()
    ...

Then whenever you want to have parallelization:

# this function is parallelized. it is run by each child process.
def processing_function(input):
    ...
    return output

...
inputs = [...]
results = parallelizer.map(
    inputs,
    processing_function
)

And the parallelizer class:

class Parallelizer:
    def __init__(self):
        self.input_queue = multiprocessing.Queue()
        self.output_queue = multiprocessing.Queue()
        self.pool = multiprocessing.Pool(multiprocessing.cpu_count(),
                                         Parallelizer._run,
                                         (self.input_queue, self.output_queue,))

    def map(self, contents, processing_func):
        size = 0
        for content in contents:
            self.input_queue.put((content, processing_func))
            size += 1
        results = []
        while size > 0:
            result = self.output_queue.get(block=True)
            results.append(result)
            size -= 1
        return results

    @staticmethod
    def _run(input_queue, output_queue):
        while True:
            content, processing_func = input_queue.get(block=True)
            result = processing_func(content)
            output_queue.put(result)

One caveat: the parallelized code might be difficult to debug so I have also prepared a non-parallelizing version of my class which I enable when something goes wrong in the child processes:

class NullParallelizer:
    @staticmethod
    def map(contents, processing_func):
        results = []
        for content in contents:
            results.append(processing_func(content))
        return results
Burned answered 14/11, 2020 at 18:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.