Pickle class definition in module with dill
Asked Answered
N

3

8

My module contains a class which should be pickleable, both instance and definition I have the following structure:

MyModule
|-Submodule
  |-MyClass

In other questions on SO I have already found that dill is able to pickle class definitions and surely enough it works by copying the definition of MyClass into a separate script and pickling it there, like this:

import dill as pickle

class MyClass(object):
    ...

instance = MyClass(...)
with open(..., 'wb') as file:
   pickle.dump(instance, file)

However, it does not work when importing the class:

Pickling:

from MyModule.Submodule import MyClass
import dill as pickle

instance = MyClass(...)
with open(.., 'wb') as file:
    pickle.dump(instance, file)

Loading:

import dill as pickle

with open(..., 'rb') as file:
    instance = pickle.load(file)

>>> ModuleNotFoundError: No module named 'MyModule'

I think the class definition is saved by reference, although it should not have as per default settings in dill. This is done correctly when MyClass is known as __main__.MyClass, which happens when the class is defined in the main script.

I am wondering, is there any way to detach MyClass from MyModule? Any way to make it act like a top level import (__main__.MyClass) so dill knows how to load it on my other machine?

Relevant question: Why dill dumps external classes by reference no matter what

Nadene answered 19/9, 2018 at 9:38 Comment(0)
R
3

Dill indeed only stores definitions of objects in __main__, and not those in modules, so one way around this problem is to redefine those objects in main:

def mainify(obj):
    import __main__
    import inspect
    import ast

    s = inspect.getsource(obj)
    m = ast.parse(s)
    co = compile(m, '<string>', 'exec')
    exec(co, __main__.__dict__)

And then:

from MyModule.Submodule import MyClass
import dill as pickle

mainify(MyClass)
instance = MyClass(...)
with open(.., 'wb') as file:
    pickle.dump(instance, file)

Now you should be able to load the pickle from anywhere, even where the MyModule.Submodule is not available.

Redpoll answered 9/11, 2020 at 20:12 Comment(0)
A
2

I'm the dill author. This is a duplicate of the question you refer to above. The relevant GitHub feature request is: https://github.com/uqfoundation/dill/issues/128.

I think the larger issue is that you want to pickle an object defined in another file that is not installed. This is currently not possible, I believe.

As a workaround, I believe you may be able to pickle with dill.source by extracting the source code of the class (or module) and pickling that dynamically, or extracting the source code and compiling a new object in __main__.

Accordant answered 19/9, 2018 at 16:58 Comment(5)
I'll potentially update this answer with a test case, If I build one that I can demonstrate my hypothesis.Accordant
Since you've never updated your answer, I take it your hypothesis didn't pan out.Ruelas
@martineau: I still think the hypothesis is valid... I think I just never went back and built a test case. It's been a few years...Accordant
The question is still relevant. See for example the recent Is there a way to serialize a class such that it can be unserialized independent of its original script?Ruelas
Sure, I get asked about this particular feature question fairly regularly. I only meant the above post is a bit stale, not that the issue itself is not relevant. Adding a file to __main__ does work, as you have done Also, if the file is imported, it works. Issues are if the file is not in the PYTHONPATH, or if a non-installed module uses local imports. See stackoverflow.com/questions/31884640, and github.com/uqfoundation/dill/issues/123#issue-99914949. As I said above, I believe that getsource should also work in dire cases -- but I should verify that...Accordant
N
1

I managed to save the instance and definition of my class using the following dirty hack:

class MyClass(object):
    def save(path):
        import __main__

        with open(__file__) as f:
            code = compile(f.read(), "somefile.py", 'exec')
            globals = __main__.__dict__
            locals = {'instance': self, 'savepath': path}
            exec(code, globals, locals)

if __name__ == '__main__':
    # Script is loaded in top level, MyClass is now available under the qualname '__main__.MyClass'
    import dill as pickle

    # copy the attributes of the 'MyModule.Submodule.MyClass' instance to a bew 'MyClass' instance.
    new_instance = MyClass.__new__(MyClass)
    new_instance.__dict__ = locals()['instance'].__dict__

    with open(locals()['savepath'], 'wb') as f:       
        pickle.dump(new_instance, f)

Using the exec statement the file can be executed from within __main__, so the class definition will be saved as well. This script should not be executed as main script without using the save function.

Nadene answered 20/9, 2018 at 12:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.