Pickle/dill cannot handle circular references if __hash__ is overridden
Asked Answered
M

1

9

Consider the following MWE:

#import dill as pickle      # Dill exhibits similar behavior
import pickle

class B:
    def __init__(self):
        self.links = set()

class A:
    def __init__(self, base: B):
        self.base = base
        base.links.add(self)

    def __hash__(self):
        return hash(self.base)

    def __eq__(self, other):
        return self.base == other.base

pickled = pickle.dumps(A(B()))  # Success
print(pickle.loads(pickled))    # Not so much

The above example fails with the following exception:

Traceback (most recent call last):
  File "./mwe.py", line 26, in <module>
    print(pickle.loads(pickled))
  File "./mwe.py", line 18, in __hash__
    return hash(self.base)
AttributeError: 'A' object has no attribute 'base'

As I understand the problem, pickle attempts to deserialize B.links before it deserializes A. The set instance used in B attempts to invoke A.__hash__ at some point, and since the instance of A is not yet fully constructed, it cannot compute its own hash, making everyone sad.

How do I get around this without breaking circular references? (breaking the cycles would be a lot of work because the object I'm trying to serialize is hilariously complex)

Madel answered 3/7, 2017 at 13:54 Comment(0)
M
8

I think you've correctly identified the cause of the problem. Both instances depend on the other, and pickle fails to initialize them in the correct order. This could be considered a bug, but luckily there's an easy workaround.

Pickle allows us to customize how objects are pickled with the __getstate__ and __setstate__ functions. We can use this to manually set the missing base attribute of the A instance before it is hashed:

class B:
    def __init__(self):
        self.links = set()

    def __getstate__(self):
        # dump a tuple instead of a set so that the __hash__ function won't be called
        return tuple(self.links)

    def __setstate__(self, state):
        self.links= set()
        for link in state:
            link.base= self # set the missing attribute
            self.links.add(link) # now it can be hashed
Mohammed answered 3/7, 2017 at 14:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.