Recursively Find Pickle Failure
Inspired by wump
's comment:
Python: can't pickle module objects error
Here is some quick code that helped me find the culprit recursively.
It checks the object in question to see if it fails pickling.
Then iterates trying to pickle the keys in __dict__
returning the list of only failed picklings.
Code Snippet
import pickle
def pickle_trick(obj, max_depth=10):
output = {}
if max_depth <= 0:
return output
try:
pickle.dumps(obj)
except (pickle.PicklingError, TypeError) as e:
failing_children = []
if hasattr(obj, "__dict__"):
for k, v in obj.__dict__.items():
result = pickle_trick(v, max_depth=max_depth - 1)
if result:
failing_children.append(result)
output = {
"fail": obj,
"err": e,
"depth": max_depth,
"failing_children": failing_children
}
return output
Example Program
import redis
import pickle
from pprint import pformat as pf
def pickle_trick(obj, max_depth=10):
output = {}
if max_depth <= 0:
return output
try:
pickle.dumps(obj)
except (pickle.PicklingError, TypeError) as e:
failing_children = []
if hasattr(obj, "__dict__"):
for k, v in obj.__dict__.items():
result = pickle_trick(v, max_depth=max_depth - 1)
if result:
failing_children.append(result)
output = {
"fail": obj,
"err": e,
"depth": max_depth,
"failing_children": failing_children
}
return output
if __name__ == "__main__":
r = redis.Redis()
print(pf(pickle_trick(r)))
Example Output
$ python3 pickle-trick.py
{'depth': 10,
'err': TypeError("can't pickle _thread.lock objects"),
'fail': Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>,
'failing_children': [{'depth': 9,
'err': TypeError("can't pickle _thread.lock objects"),
'fail': ConnectionPool<Connection<host=localhost,port=6379,db=0>>,
'failing_children': [{'depth': 8,
'err': TypeError("can't pickle _thread.lock objects"),
'fail': <unlocked _thread.lock object at 0x10bb58300>,
'failing_children': []},
{'depth': 8,
'err': TypeError("can't pickle _thread.RLock objects"),
'fail': <unlocked _thread.RLock object owner=0 count=0 at 0x10bb58150>,
'failing_children': []}]},
{'depth': 9,
'err': PicklingError("Can't pickle <function Redis.<lambda> at 0x10c1e8710>: attribute lookup Redis.<lambda> on redis.client failed"),
'fail': {'ACL CAT': <function Redis.<lambda> at 0x10c1e89e0>,
'ACL DELUSER': <class 'int'>,
0x10c1e8170>,
.........
'ZSCORE': <function float_or_none at 0x10c1e5d40>},
'failing_children': []}]}
Root Cause - Redis can't pickle _thread.lock
In my case, creating an instance of Redis
that I saved as an attribute of an object broke pickling.
When you create an instance of Redis
it also creates a connection_pool
of Threads
and the thread locks can not be pickled.
I had to create and clean up Redis
within the multiprocessing.Process
before it was pickled.
Testing
In my case, the class that I was trying to pickle, must be able to pickle. So I added a unit test that creates an instance of the class and pickles it. That way if anyone modifies the class so it can't be pickled, therefore breaking it's ability to be used in multiprocessing (and pyspark), we will detect that regression and know straight away.
def test_can_pickle():
# Given
obj = MyClassThatMustPickle()
# When / Then
pkl = pickle.dumps(obj)
# This test will throw an error if it is no longer pickling correctly