How to use dill to serialize a class definition?
Asked Answered
S

2

16

In the answer to Python pickle: dealing with updated class definitions, the author of the dill package writes:

"Ok, I have added this feature to dill in the latest revision on github. Implemented with far less trickery than I thought... just serialize the class definition with the pickle, and voila."

Having installed dill and tinkered with it, it's not obvious to me how to actually use this functionality in dill. Could someone provide an explicit example? I would like to pickle the class instance and also serialize the class definition.

(I am new to python and I this functionality seems extremely important, as since when pickling an object it would be great to get as close to a guarantee as possible that you could look at the object (could be the result of a simulation) in the future after the class definition may have changed and you haven't kept track of all the changes in an easily accessible way.)

Strange answered 1/9, 2014 at 22:23 Comment(0)
U
14

I think you are looking for one of the following functionalities…

Here I build a class, and an instance, and then change the class definition. The pickled class and instance still are unpicklable because dill pickles the source code for the class by default… and manages having several classes with the same name in the namespace (it does this simply by managing the pointer references to the class definitions).

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x+self.y       
...   y = 1
... 
>>> f = Foo()
>>> _Foo = dill.dumps(Foo)
>>> _f = dill.dumps(f)
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return x*self.z  
...   z = -1 
... 
>>> f_ = dill.loads(_f, ignore=True)
>>> f_.y
1
>>> f_.bar(1)
2
>>> Foo_ = dill.loads(_Foo)
>>> g = Foo_()
>>> g.bar(1)
2

Pickle would blow up on the above. If you don't want dill to serialize the class explicitly, and to do what pickle does, then you can ask dill to pickle by reference with dill.dumps(Foo, byref=True). Alternately, you can dynamically decide to ignore the newly-defined class by using ignore=False (the default).

Now, in the case below, we work with the new class definition, and extract the source from the object, then save it to a file. Additionally, we can dump the source to a file (here I use a temporary file) so it can be imported later.

>>> sFoo = dill.source.getsource(Foo)
>>> print sFoo
class Foo(object):
  def bar(self, x):
    return x*self.z
  z = -1

>>> open('myFoo.py', 'w').write(sFoo)    
>>>
>>> f = dill.temp.dump_source(Foo, dir='.')
>>> f.name
'/Users/mmckerns/dev/tmpM1dzYN.py'
>>> from tmpM1dzYN import Foo as _Foo_
>>> h = _Foo_()
>>> h.bar(2)
-2
>>> from myFoo import Foo as _SFoo_
>>> _SFoo_.z
>>> -1
>>> 

I hope that helps.

Ultramontanism answered 2/9, 2014 at 12:32 Comment(6)
Concerning your first code block, I run it and it does blow up exactly like pickle would: AttributeError: 'Foo' object has no attribute 'y'. circa line 25. Too bad, because it sounded like an instance of magic, and it would make storing dill-ed objects in a database more maintainable. So there is a huge use case. Dill 0.2.7.1, dill settings at default (byref False). I hope it's operator error, but cut and paste is solidly within my skillset.Locomotion
The error is specific to ipython. Big sigh of relief but still puzzling ... will investigate and/or open a ticketLocomotion
@piccolbo: Weird. I can repeat the behavior in ipython -- so they must be mucking with something unexpected. This seems like it's new behavior. Hmm. I saw your ticket, thanks.Ultramontanism
As detailed in your ticket, I've updated dill.settings to explicitly be able to toggle the behavior you wanted. I've edited my answer above to show that.Ultramontanism
I don't see a link to the issue so here it is github.com/uqfoundation/dill/issues/243 Thanks MikeLocomotion
I could have saved so much time if they'd just explained that ignore option properly in the docs, as it is, you get an upvote from me : )Corrugation
V
5

If this functionality were that important, it would be in the language core by now. :-) So, no, this is not critical to use Python in any advanced form - and if you have a project that relies on being able to re-instantiate objects based in old models - which is possible, you have to carefully think about it, and probably keep the old models around in explicit code, instead of having then serialized.

My advice is just "leave that apart" until you think you actually need it, and had compared it with other solutions, like a strong model migration policy.

That said, I've tried dill, and it works as advertised: it can serialize a class just like pickle can do with ordinary objects using the "dump" and "dumps" calls, and rebuild the class object with "load" and "loads".

What probably is getting you confused is that serializing an object (through pickle or dill alike) does not include neither its source code (i.e. the actual lines of textual Python code used to define the class), nor its name.

So, if a class is named "A", when it is serialized, if you need that name after "undilling" it, you have to reassign that name to it in the global name space. It's orignal name is preserved in it's __name__ attribute. (and for your purposes of multiple versions of the same model living together, that would lead to a lot of conflict).

Thus:

class A(object):
    ...

import dill

dill.dump(A, open("myfile", "w"))

del A
....
someclass = dill.load(open("myfile"))
print (someclass.__name__)
globals()[someclass.__name__] = someclass
# at this point you have the "A" class back in the global namespace
Verbenia answered 2/9, 2014 at 11:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.