The filtering part is indeed tricky. Using simple tricks, you can easily get the pickle to work. However, you might end up filtering out too much and losing information that you could keep when the filter looks a little bit deeper. But the vast possibility of things that can end up in the .namespace
makes building a good filter difficult.
However, we could leverage pieces that are already part of Python, such as deepcopy
in the copy
module.
I made a copy of the stock copy
module, and did the following things:
- create a new type named
LostObject
to represent object that will be lost in pickling.
- change
_deepcopy_atomic
to make sure x
is picklable. If it's not, return an instance of LostObject
- objects can define methods
__reduce__
and/or __reduce_ex__
to provide hint about whether and how to pickle it. We make sure these methods will not throw exception to provide hint that it cannot be pickled.
- to avoid making unnecessary copy of big object (a la actual deepcopy), we recursively check whether an object is picklable, and only make unpicklable part. For instance, for a tuple of a picklable list and and an unpickable object, we will make a copy of the tuple - just the container - but not its member list.
The following is the diff:
[~/Development/scratch/] $ diff -uN /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py mcopy.py
--- /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py 2010-01-09 00:18:38.000000000 -0800
+++ mcopy.py 2010-11-10 08:50:26.000000000 -0800
@@ -157,6 +157,13 @@
cls = type(x)
+ # if x is picklable, there is no need to make a new copy, just ref it
+ try:
+ dumps(x)
+ return x
+ except TypeError:
+ pass
+
copier = _deepcopy_dispatch.get(cls)
if copier:
y = copier(x, memo)
@@ -179,10 +186,18 @@
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
+ try:
+ x.__reduce_ex__()
+ except TypeError:
+ rv = LostObject, tuple()
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
+ try:
+ x.__reduce__()
+ except TypeError:
+ rv = LostObject, tuple()
else:
raise Error(
"un(deep)copyable object of type %s" % cls)
@@ -194,7 +209,12 @@
_deepcopy_dispatch = d = {}
+from pickle import dumps
+class LostObject(object): pass
def _deepcopy_atomic(x, memo):
+ try:
+ dumps(x)
+ except TypeError: return LostObject()
return x
d[type(None)] = _deepcopy_atomic
d[type(Ellipsis)] = _deepcopy_atomic
Now back to the pickling part. You simply make a deepcopy using this new deepcopy
function and then pickle the copy. The unpicklable parts have been removed during the copying process.
x = dict(a=1)
xx = dict(x=x)
x['xx'] = xx
x['f'] = file('/tmp/1', 'w')
class List():
def __init__(self, *args, **kwargs):
print 'making a copy of a list'
self.data = list(*args, **kwargs)
x['large'] = List(range(1000))
# now x contains a loop and a unpickable file object
# the following line will throw
from pickle import dumps, loads
try:
dumps(x)
except TypeError:
print 'yes, it throws'
def check_picklable(x):
try:
dumps(x)
except TypeError:
return False
return True
class LostObject(object): pass
from mcopy import deepcopy
# though x has a big List object, this deepcopy will not make a new copy of it
c = deepcopy(x)
dumps(c)
cc = loads(dumps(c))
# check loop refrence
if cc['xx']['x'] == cc:
print 'yes, loop reference is preserved'
# check unpickable part
if isinstance(cc['f'], LostObject):
print 'unpicklable part is now an instance of LostObject'
# check large object
if loads(dumps(c))['large'].data[999] == x['large'].data[999]:
print 'large object is ok'
Here is the output:
making a copy of a list
yes, it throws
yes, loop reference is preserved
unpicklable part is now an instance of LostObject
large object is ok
You see that 1) mutual pointers (between x
and xx
) are preserved and we do not run into infinite loop; 2) the unpicklable file object is converted to a LostObject
instance; and 3) not new copy of the large object is created since it is picklable.
function
objects. – Toms