Gracefully-degrading pickling in Python
Asked Answered
N

3

6

(You may read this question for some background)

I would like to have a gracefully-degrading way to pickle objects in Python.

When pickling an object, let's call it the main object, sometimes the Pickler raises an exception because it can't pickle a certain sub-object of the main object. For example, an error I've been getting a lot is "can’t pickle module objects." That is because I am referencing a module from the main object.

I know I can write up a little something to replace that module with a facade that would contain the module's attributes, but that would have its own issues(1).

So what I would like is a pickling function that automatically replaces modules (and any other hard-to-pickle objects) with facades that contain their attributes. That may not produce a perfect pickling, but in many cases it would be sufficient.

Is there anything like this? Does anyone have an idea how to approach this?


(1) One issue would be that the module may be referencing other modules from within it.

Nipper answered 28/8, 2009 at 17:13 Comment(1)
Java Beans .. Python Pickles .. I'd like to throttle the nerds who come up with this cutesy stuffMons
M
3

You can decide and implement how any previously-unpicklable type gets pickled and unpickled: see standard library module copy_reg (renamed to copyreg in Python 3.*).

Essentially, you need to provide a function which, given an instance of the type, reduces it to a tuple -- with the same protocol as the reduce special method (except that the reduce special method takes no arguments, since when provided it's called directly on the object, while the function you provide will take the object as the only argument).

Typically, the tuple you return has 2 items: a callable, and a tuple of arguments to pass to it. The callable must be registered as a "safe constructor" or equivalently have an attribute __safe_for_unpickling__ with a true value. Those items will be pickled, and at unpickling time the callable will be called with the given arguments and must return the unpicked object.

For example, suppose that you want to just pickle modules by name, so that unpickling them just means re-importing them (i.e. suppose for simplicity that you don't care about dynamically modified modules, nested packages, etc, just plain top-level modules). Then:

>>> import sys, pickle, copy_reg
>>> def savemodule(module):
...   return __import__, (module.__name__,)
... 
>>> copy_reg.pickle(type(sys), savemodule)
>>> s = pickle.dumps(sys)
>>> s
"c__builtin__\n__import__\np0\n(S'sys'\np1\ntp2\nRp3\n."
>>> z = pickle.loads(s)
>>> z
<module 'sys' (built-in)>

I'm using the old-fashioned ASCII form of pickle so that s, the string containing the pickle, is easy to examine: it instructs unpickling to call the built-in import function, with the string sys as its sole argument. And z shows that this does indeed give us back the built-in sys module as the result of the unpickling, as desired.

Now, you'll have to make things a bit more complex than just __import__ (you'll have to deal with saving and restoring dynamic changes, navigate a nested namespace, etc), and thus you'll have to also call copy_reg.constructor (passing as argument your own function that performs this work) before you copy_reg the module-saving function that returns your other function (and, if in a separate run, also before you unpickle those pickles you made using said function). But I hope this simple cases helps to show that there's really nothing much to it that's at all "intrinsically" complicated!-)

Mathre answered 28/8, 2009 at 20:41 Comment(4)
@Alex Martelli: When I use copy_reg.pickle, what is the scope in which this change will be relevant? I want people to be able to import my work without it changing any system values that might wreck their program.Nipper
copy_reg is global. But if they're not currently pickling modules (which is impossible by system defaults), it can't "wreck their program" to make modules picklable.Mathre
@Alex Martelli: But if they ran into the same issue and defined module pickling a different way, we would have a problem. I believe in being polite and not changing the system state. I believe that when you import some module in Python you shouldn't have to worry about it messing with your system's globals, and that it's important to have tools that allow you to avoid this kind of "impoliteness" in your modules.Nipper
@cool-RR, I see your point, but wouldn't stretch it farther than verifying that modules can't be pickled (on startup, try/except around an attempt to pickle a module) and allowing potential third-party reusers of your code to turn your schema on and off. Allowing reuse should not come at the expense of interfering with clean and maintainable use, when it's so unlikely that the documented and designed ways to allow module pickling will cause the bad effect you're worrying about (and I've NEVER seen ANY framework that pickles modules... and I've seen WAY MANY of them!-).Mathre
C
0

How about the following, which is a wrapper you can use to wrap some modules (maybe any module) in something that's pickle-able. You could then subclass the Pickler object to check if the target object is a module, and if so, wrap it. Does this accomplish what you desire?

class PickleableModuleWrapper(object):
    def __init__(self, module):
        # make a copy of the module's namespace in this instance
        self.__dict__ = dict(module.__dict__)
        # remove anything that's going to give us trouble during pickling
        self.remove_unpickleable_attributes()

    def remove_unpickleable_attributes(self):
        for name, value in self.__dict__.items():
            try:
                pickle.dumps(value)
            except Exception:
                del self.__dict__[name]

import pickle
p = pickle.dumps(PickleableModuleWrapper(pickle))
wrapped_mod = pickle.loads(p)
Canonry answered 28/8, 2009 at 19:13 Comment(0)
T
0

Hmmm, something like this?

import sys

attribList = dir(someobject)
for attrib in attribList:
    if(type(attrib) == type(sys)): #is a module
        #put in a facade, either recursively list the module and do the same thing, or just put in something like str('modulename_module')
    else:
        #proceed with normal pickle

Obviously, this would go into an extension of the pickle class with a reimplemented dump method...

Trepang answered 28/8, 2009 at 19:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.