How is dill different from Python's pickle module?
Asked Answered
Q

1

23

I have a large object in my Python3 code which, when tried to be pickled with the pickle module throws the following error:

TypeError: cannot serialize '_io.BufferedReader' object

However, dill.dump() and dill.load() are able to save and restore the object seamlessly.

  1. What causes the trouble for the pickle module?
  2. Now that dill saves and reconstructs the object without any error, is there any way to verify if the pickling and unpickling with dill went well?
  3. How's it possible that pickle fails, but dill succeeds?
Quelpart answered 1/10, 2019 at 22:36 Comment(2)
TL;DR: pickle doesn't handle functions or complex objects as well as dill. I use dill for all my Data Science pickling since the models and objects are very deep and complexPowe
Dill is also built on top of pickle but like above, it's made for complex objects where pickle can't succeed.Essam
G
68

I'm the dill author.

1) Easiest thing to do is look at this file: https://github.com/uqfoundation/dill/blob/master/dill/_objects.py, it lists what pickle can serialize, and what dill can serialize.

2) you can try dill.copy and dill.check and dill.pickles to check different levels of pickling and unpickling. dill also more includes utilities for detecting and diagnosing serialization issues in dill.detect and dill.pointers.

3) dill is built on pickle, and augments it by registering new serialization functions.

4) dill includes serialization variants which enable the user to choose from different object dependency serialization strategies (in dill.settings) -- including source code extraction and object reconstitution with dill.source (and extension of the stdlib inspect module).

Gorrian answered 2/10, 2019 at 0:25 Comment(7)
Why is dill not merged into pickle?Dry
@tejasvi88: Primarily, there's never been a PEP to do so.Gorrian
dill is better, can't pickle while using multiprocessingLeafy
@Dee why can't pickle be used while multiprocessing?Eviaevict
I think Dee meant that multiprocessing uses pickle to pass objects across processes. A fork of multiprocessing (called multprocess) replaces pickle with dill.Gorrian
the pypy support seems not great in dill yet?Weathered
How so? pypy is fully supported by dill for several years. So... if you have issues with pypy, please do fill out a GitHub ticket, so I know what they are.Gorrian

© 2022 - 2024 — McMap. All rights reserved.