What is the proper way to make an object with unpickable fields pickable?
Asked Answered
P

3

6

For me what I do is detect what is unpickable and make it into a string (I guess I could have deleted it too but then it will falsely tell me that field didn't exist but I'd rather have it exist but be a string). But I wanted to know if there was a less hacky more official way to do this.

Current code I use:

def make_args_pickable(args: Namespace) -> Namespace:
    """
    Returns a copy of the args namespace but with unpickable objects as strings.

    note: implementation not tested against deep copying.
    ref:
        - https://mcmap.net/q/1767320/-what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
    """
    pickable_args = argparse.Namespace()
    # - go through fields in args, if they are not pickable make it a string else leave as it
    # The vars() function returns the __dict__ attribute of the given object.
    for field in vars(args):
        field_val: Any = getattr(args, field)
        if not dill.pickles(field_val):
            field_val: str = str(field_val)
        setattr(pickable_args, field, field_val)
    return pickable_args

Context: I think I do it mostly to remove the annoying tensorboard object I carry around (but I don't think I will need the .tb field anymore thanks to wandb/weights and biases). Not that this matters a lot but context is always nice.

Related:


Edit:

Since I decided to move away from dill - since sometimes it cannot recover classes/objects (probably because it cannot save their code or something) - I decided to only use pickle (which seems to be the recommended way to be done in PyTorch).

So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?

Is this the best:

def is_picklable(obj):
  try:
    pickle.dumps(obj)

  except pickle.PicklingError:
    return False
  return True

thus current soln:

def make_args_pickable(args: Namespace) -> Namespace:
    """
    Returns a copy of the args namespace but with unpickable objects as strings.

    note: implementation not tested against deep copying.
    ref:
        - https://mcmap.net/q/1767320/-what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
    """
    pickable_args = argparse.Namespace()
    # - go through fields in args, if they are not pickable make it a string else leave as it
    # The vars() function returns the __dict__ attribute of the given object.
    for field in vars(args):
        field_val: Any = getattr(args, field)
        # - if current field value is not pickable, make it pickable by casting to string
        if not dill.pickles(field_val):
            field_val: str = str(field_val)
        elif not is_picklable(field_val):
            field_val: str = str(field_val)
        # - after this line the invariant is that it should be pickable, so set it in the new args obj
        setattr(pickable_args, field, field_val)
    return pickable_args


def make_opts_pickable(opts):
    """ Makes a namespace pickable """
    return make_args_pickable(opts)


def is_picklable(obj: Any) -> bool:
    """
    Checks if somehting is pickable.

    Ref:
        - https://mcmap.net/q/1767320/-what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
    """
    import pickle
    try:
        pickle.dumps(obj)
    except pickle.PicklingError:
        return False
    return True

Note: one of the reasons I want something "offical"/tested is because I am getting pycharm halt on the try catch: How to stop PyCharm's break/stop/halt feature on handled exceptions (i.e. only break on python unhandled exceptions)? which is not what I want...I want it to only halt on unhandled exceptions.

Pascual answered 26/11, 2021 at 17:51 Comment(2)
I don't think this is possible. Pickles are recursive objects or containers. If you have multiple nested containers and one of the items inside is unpickleable, you can't know it other than trying to pickle and failing.Harken
@Harken yea that is what I realized once one of my objects had a pointer/ref to the other object it can't pickle...then my main issue must be fixing pycharm halting/breaking on my try except block...Pascual
W
2

What is the proper way to make an object with unpickable fields pickable?

I believe the answer to this belongs in the question you linked -- Python - How can I make this un-pickleable object pickleable?. I've added a new answer to that question explaining how you can make an unpicklable object picklable the proper way, without using __reduce__.

So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?

Objects that are picklable are defined in the docs as follows:

  • None, True, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module (using def, not lambda)
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).

The tricky parts are (1) knowing how functions/classes are defined (you can probably use the inspect module for that) and (2) recursing through objects, checking against the rules above.

There are a lot of caveats to this, such as the pickle protocol versions, whether the object is an extension type (defined in a C extension like numpy, for example) or an instance of a 'user-defined' class. Usage of __slots__ can also impact whether an object is picklable or not (since __slots__ means there's no __dict__), but can be pickled with __getstate__. Some objects may also be registered with a custom function for pickling. So, you'd need to know if that has happened as well.

Technically, you can implement a function to check for all of this in Python, but it will be quite slow by comparison. The easiest (and probably most performant, as pickle is implemented in C) way to do this is to simply attempt to pickle the object you want to check.

I tested this with PyCharm pickling all kinds of things... it doesn't halt with this method. The key is that you must anticipate pretty much any kind of exception (see footnote 3 in the docs). The warnings are optional, they're mostly explanatory for the context of this question.

def is_picklable(obj: Any) -> bool:
    try:
        pickle.dumps(obj)
        return True
    except (pickle.PicklingError, pickle.PickleError, AttributeError, ImportError):
        # https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
        return False
    except RecursionError:
        warnings.warn(
            f"Could not determine if object of type {type(obj)!r} is picklable"
            "due to a RecursionError that was supressed. "
            "Setting a higher recursion limit MAY allow this object to be pickled"
        )
        return False
    except Exception as e:
        # https://docs.python.org/3/library/pickle.html#id9
        warnings.warn(
            f"An error occurred while attempting to pickle"
            f"object of type {type(obj)!r}. Assuming it's unpicklable. The exception was {e}"
        )
        return False

Using the example from my other answer I linked above, you could make your object picklable by implementing __getstate__ and __setstate__ (or subclassing and adding them, or making a wrapper class) adapting your make_args_pickable...

class Unpicklable:
    """
    A simple marker class so we can distinguish when a deserialized object
    is a string because it was originally unpicklable 
    (and not simply a string to begin with)
    """
    def __init__(self, obj_str: str):
        self.obj_str = obj_str

    def __str__(self):
        return self.obj_str

    def __repr__(self):
        return f'Unpicklable(obj_str={self.obj_str!r})'


class PicklableNamespace(Namespace):
    def __getstate__(self):
        """For serialization"""

        # always make a copy so you don't accidentally modify state
        state = self.__dict__.copy()

        # Any unpicklables will be converted to a ``Unpicklable`` object 
        # with its str format stored in the object
        for key, val in state.items():
            if not is_picklable(val):
                state[key] = Unpicklable(str(val))
        return state
    def __setstate__(self, state):
        self.__dict__.update(state)  # or leave unimplemented

In action, I'll pickle a namespace whose attributes contain a file handle (normally not picklable) and then load the pickle data.

# Normally file handles are not picklable
p = PicklableNamespace(f=open('test.txt'))

data = pickle.dumps(p)
del p

loaded_p = pickle.loads(data)
# PicklableNamespace(f=Unpicklable(obj_str="<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>"))
Warnerwarning answered 22/1, 2022 at 19:45 Comment(0)
R
2

Yes, a try/except is the best way to go about this.

Per the docs, pickle is capable of recursively pickling objects, that is to say, if you have a list of objects that are pickleable, it will pickle all objects inside of that list if you attempt to pickle that list. This means that you cannot feasibly test to see if an object is pickleable without pickling it. Because of that, your structure of:

def is_picklable(obj):
  try:
    pickle.dumps(obj)

  except pickle.PicklingError:
    return False
  return True

is the simplest and easiest way to go about checking this. If you are not working with recursive structures and/or you can safely assume that all recursive structures will only contain pickleable objects, you could check the type() value of the object against the list of pickleable objects:

  • None, True, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module (using def, not lambda)
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).

This is likely faster than using a try:... except:... like you showed in your question.

Rozanneroze answered 19/1, 2022 at 22:30 Comment(0)
W
2

What is the proper way to make an object with unpickable fields pickable?

I believe the answer to this belongs in the question you linked -- Python - How can I make this un-pickleable object pickleable?. I've added a new answer to that question explaining how you can make an unpicklable object picklable the proper way, without using __reduce__.

So what is the official (perhaps optimized) way to check for pickables without dill or with the official pickle?

Objects that are picklable are defined in the docs as follows:

  • None, True, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries containing only picklable objects
  • functions defined at the top level of a module (using def, not lambda)
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module
  • instances of such classes whose dict or the result of calling getstate() is picklable (see section Pickling Class Instances for details).

The tricky parts are (1) knowing how functions/classes are defined (you can probably use the inspect module for that) and (2) recursing through objects, checking against the rules above.

There are a lot of caveats to this, such as the pickle protocol versions, whether the object is an extension type (defined in a C extension like numpy, for example) or an instance of a 'user-defined' class. Usage of __slots__ can also impact whether an object is picklable or not (since __slots__ means there's no __dict__), but can be pickled with __getstate__. Some objects may also be registered with a custom function for pickling. So, you'd need to know if that has happened as well.

Technically, you can implement a function to check for all of this in Python, but it will be quite slow by comparison. The easiest (and probably most performant, as pickle is implemented in C) way to do this is to simply attempt to pickle the object you want to check.

I tested this with PyCharm pickling all kinds of things... it doesn't halt with this method. The key is that you must anticipate pretty much any kind of exception (see footnote 3 in the docs). The warnings are optional, they're mostly explanatory for the context of this question.

def is_picklable(obj: Any) -> bool:
    try:
        pickle.dumps(obj)
        return True
    except (pickle.PicklingError, pickle.PickleError, AttributeError, ImportError):
        # https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
        return False
    except RecursionError:
        warnings.warn(
            f"Could not determine if object of type {type(obj)!r} is picklable"
            "due to a RecursionError that was supressed. "
            "Setting a higher recursion limit MAY allow this object to be pickled"
        )
        return False
    except Exception as e:
        # https://docs.python.org/3/library/pickle.html#id9
        warnings.warn(
            f"An error occurred while attempting to pickle"
            f"object of type {type(obj)!r}. Assuming it's unpicklable. The exception was {e}"
        )
        return False

Using the example from my other answer I linked above, you could make your object picklable by implementing __getstate__ and __setstate__ (or subclassing and adding them, or making a wrapper class) adapting your make_args_pickable...

class Unpicklable:
    """
    A simple marker class so we can distinguish when a deserialized object
    is a string because it was originally unpicklable 
    (and not simply a string to begin with)
    """
    def __init__(self, obj_str: str):
        self.obj_str = obj_str

    def __str__(self):
        return self.obj_str

    def __repr__(self):
        return f'Unpicklable(obj_str={self.obj_str!r})'


class PicklableNamespace(Namespace):
    def __getstate__(self):
        """For serialization"""

        # always make a copy so you don't accidentally modify state
        state = self.__dict__.copy()

        # Any unpicklables will be converted to a ``Unpicklable`` object 
        # with its str format stored in the object
        for key, val in state.items():
            if not is_picklable(val):
                state[key] = Unpicklable(str(val))
        return state
    def __setstate__(self, state):
        self.__dict__.update(state)  # or leave unimplemented

In action, I'll pickle a namespace whose attributes contain a file handle (normally not picklable) and then load the pickle data.

# Normally file handles are not picklable
p = PicklableNamespace(f=open('test.txt'))

data = pickle.dumps(p)
del p

loaded_p = pickle.loads(data)
# PicklableNamespace(f=Unpicklable(obj_str="<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>"))
Warnerwarning answered 22/1, 2022 at 19:45 Comment(0)
P
0

To me no matter the error I want my function to tell me it's not pickable. So it seems to work if I do this:

def is_picklable(obj: Any) -> bool:
    """
    Checks if somehting is pickable.

    Ref:
        - https://mcmap.net/q/1767320/-what-is-the-proper-way-to-make-an-object-with-unpickable-fields-pickable
        - pycharm halting all the time issue: https://mcmap.net/q/1780420/-how-to-stop-pycharm-39-s-break-stop-halt-feature-on-handled-exceptions-i-e-only-break-on-python-unhandled-exceptions
    """
    import pickle
    try:
        pickle.dumps(obj)
    except:
        return False
    return True

plus as an added bonus it doesn't freak pycharm out see How to stop PyCharm's break/stop/halt feature on handled exceptions (i.e. only break on python unhandled exceptions)? for details.

Pascual answered 26/1, 2022 at 0:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.