AttributeError: Can't pickle local object in Multiprocessing
Asked Answered
E

4

37

I am very new to python and I encounter this error. CODE 1 :

import multiprocessing as mp
import os
 
def calc(num1, num2):
    global addi
    def addi(num1, num2):
        print(num1+num2)
    m = mp.Process(target = addi, args = (num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()
  
if __name__ == "__main__":
    # creating processes
   calc(5, 6)

ERROR 1 :    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'calc.<locals>.addi'

After reading around a little I understand that pickle cannot be used for local methods and so I also tried the below solution which gave another error.

CODE 2 :

import multiprocessing as mp
import os
   
def calc(num1, num2):
    **global addi**
    def addi(num1, num2):
        print(num1+num2)
    m = mp.Process(target = addi, args = (num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()
  
if __name__ == "__main__":
    # creating processes
   calc(5, 6)
ERROR 2 :
 self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'addi' on <module '__mp_main__' from '/Users

Could someone please help me out with this? I am clueless on what to do next! The python version I am using is python3.8.9

Thank you so much!

Endodontics answered 27/6, 2022 at 1:53 Comment(2)
Is there a reason why yu re defining addi inside calc? Also, what os are you on?Pensionary
@Charchit, this is an extremely simplified example of the code I am working with. I am actually trying to move some scripts from python2.7 to python3.8.9 . I am facing the same issue with my actual code and I am kinda lost about what to do next. I am on macOS Monterey 12.4Endodontics
P
51

Basically, the reason you are getting this error is because multiprocessing uses pickle, which can only serialize top-module level functions in general. Function addi is not a top-module level function. In fact, the line global addi is not doing anything because addi has never been declared in the outer module. So you have three ways to fix this.

Method 1

You can define addi in the global scope before executing calc function:

import multiprocessing as mp
import os


def addi(num1, num2):
    print(num1 + num2)

def calc(num1, num2):

    m = mp.Process(target=addi, args=(num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()


if __name__ == "__main__":
    # creating processes
    calc(5, 6)

Output

here is main 9924
11

Method 2

You can switch to multiprocess, which uses dill instead of pickle, and can serialize such functions.

import multiprocess as mp  # Note that we are importing "multiprocess", no "ing"!
import os

def calc(num1, num2):

    def addi(num1, num2):
        print(num1 + num2)

    m = mp.Process(target=addi, args=(num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()


if __name__ == "__main__":
    # creating processes
    calc(5, 6)

Output

here is main 67632
11

Method 2b

While it's a useful library, there are a few valid reasons why you may not want to use multiprocess. A big one is the fact that the standard library's multiprocessing and this fork are not compatible with each other (especially if you use anything from within the subpackage multiprocessing.managers). This means that if you are using this fork in your own project, but also use third-party libraries which themselves use the standard library's multiprocesing instead, you may see unexpected behaviour.

Anyway, in cases where you want to stick with the standard library's multiprocessing and not use the fork, you can use dill yourself to serialize python closures like the function addi by subclassing the Process class and adding some of our own logic. An example is given below:

import dill
from multiprocessing import Process  # Use the standard library only
import os

class DillProcess(Process):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._target = dill.dumps(self._target)  # Save the target function as bytes, using dill

    def run(self):
        if self._target:
            self._target = dill.loads(self._target)    # Unpickle the target function before executing
            self._target(*self._args, **self._kwargs)  # Execute the target function


def calc(num1, num2):

    def addi(num1, num2):
        print(num1 + num2)

    m = DillProcess(target=addi, args=(num1, num2))  # Note how we use DillProcess, and not multiprocessing.Process
    m.start()

    print("here is main", os.getpid())
    m.join()


if __name__ == "__main__":
    # creating processes
    calc(5, 6)

Output

here is main 23360
11

Method 3

This method is for those who cannot use any third-party libraries in their code. I will recommend making sure that the above methods did not work before resorting to this one because it's a little hacky and you do need to restructure some of your code.

Anyways, this method works by referencing your local functions in the top-module scope, so that they become accessible by pickle. To do this dynamically, we create a placeholder class and add all the local functions as its class attributes. We would also need to make sure that the functions' __qualname__ attribute is altered to point to their new location, and that this all is done every run outside the if __name__ ... block (otherwise newly started processes won't see the attributes). Consider a slightly modified version of your code here:

import multiprocessing as mp
import os

def calc(num1, num2):

    def addi(num1, num2):
        print(num1 + num2)

    # Another local function you might have
    def addi2():
        print('hahahaha')

    m = mp.Process(target=addi, args=(num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()


if __name__ == "__main__":
    # creating processes
    calc(5, 6)

Below is a how you can make it work by using the above detailed method:

import multiprocessing as mp
import os


# This is our placeholder class, all local functions will be added as it's attributes
class _LocalFunctions:
    @classmethod
    def add_functions(cls, *args):
        for function in args:
            setattr(cls, function.__name__, function)
            function.__qualname__ = cls.__qualname__ + '.' + function.__name__


def calc(num1, num2, _init=False):
    # The _init parameter is to initialize all local functions outside __main__ block without actually running the 
    # whole function. Basically, you shift all local function definitions to the top and add them to our 
    # _LocalFunctions class. Now, if the _init parameter is True, then this means that the function call was just to 
    # initialize the local functions and you SHOULD NOT do anything else. This means that after they are initialized,
    # you simply return (check below)

    def addi(num1, num2):
        print(num1 + num2)

    # Another local function you might have
    def addi2():
        print('hahahaha')

    # Add all functions to _LocalFunctions class, separating each with a comma:
    _LocalFunctions.add_functions(addi, addi2)

    # IMPORTANT: return and don't actually execute the logic of the function if _init is True!
    if _init is True:
        return

    # Beyond here is where you put the function's actual logic including any assertions, etc.
    m = mp.Process(target=addi, args=(num1, num2))
    m.start()

    print("here is main", os.getpid())
    m.join()


# All factory functions must be initialized BEFORE the "if __name__ ..." clause. If they require any parameters,
# substitute with bogus ones and make sure to put the _init parameter value as True!
calc(0, 0, _init=True)

if __name__ == '__main__':
    a = calc(5, 6)

So there are a few things you would need to change in your code, namely that all local functions inside are defined at the top and all factory functions need to be initialized (for which they need to accept the _init parameter) outside the if __name__ ... clause. But this is probably the best you can do if you can't use dill.

Pensionary answered 27/6, 2022 at 17:18 Comment(2)
I find it hard to believe we need to use a third party library just to call other functions within the function we're using to spawn another process. is there something I'm missing?Myungmyxedema
@BenArnao Concurrency in python has always been bit of an afterthought and must be done through pickling. Because inner functions aren't accessible to outer-scope (this isn't common just to python, it's the same with anonymous functions in javascript, for example), they cannot be pickled through traditional means. With that said, I have never really looked into how dill specifically pickles such functions (but it's probably an acceptable workaround for this situation that sacrifices performance for utility)Pensionary
C
4

I think I can add to this question, I just solved very similar problem. Sometimes it is not possible (or inefficient) to create a global function. I think the example will be best to explain what I mean. Suppose you have a function foo that holds some variables. Those will not change. Say you want to execute some function baz that accepts more than one argument. only one will.

In code it would look like this:

from multiprocessing import Pool

def foo(x,y,z):
   # do what ever here 
   return x+y+z

def baz():
   x = 5
   y = 25
   zs = [1,2,3,4,5]
   unary = lambda z: foo(x,y,z)
   with Pool() as pool:
      results = pool.imap_unordered(unary, zs)
      for result in results:
         # whatever you do with result goes here

This will not work however because unary is defined localy. Instead we should do it using partial from functools:

from multiprocessing import Pool
from functools import partial

def foo(x,y,z):
   # do what ever here 
   return x+y+z

def baz():
   x = 5
   y = 25
   zs = [1,2,3,4,5]
   unary = partial(foo, x, y)
   with Pool() as pool:
      results = pool.imap_unordered(unary, zs)
      for result in results:
         # whatever you do with result goes here

This will work and will solve the issue.

Chase answered 10/8, 2023 at 20:44 Comment(1)
This!!! I had like 10 arguments but 2 were one more level deep. This is much better than having to replicate x and y many times just to match zs.Aerotherapeutics
P
0

One more possibility is using classes with __call__ implementation.

For example,

class Heuristic:
    def __init__(self, goal_value: int):
        self.goal_value = goal_value

    def __call__(self, v: int) -> float:
        return abs(v - self.goal_value)

Then in your main code

pool = Pool(processes=max(os.cpu_count() - 1, 1))
value_goodness = pool.starmap(Heuristic(10), [[i] for i in range(10)])

This will get rid of all the pickle serialization problems, and having to deal with if __name__ == cases (as described in the accepted answer)

Patmore answered 25/2 at 20:58 Comment(0)
K
-1

set_start_method('fork') in main

Karlise answered 30/8, 2022 at 12:21 Comment(3)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Huysmans
More details in another answer.Mezereon
This will only work on platforms that support fork. I think it's a Linux only option.Superstratum

© 2022 - 2024 — McMap. All rights reserved.