How can I recover a corrupted, partially pickled file?
Asked Answered
S

2

6

My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.

Is it possible to partially or fully recover the data? If so, how?

Here's what I've tried:

>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
    obj = pik.load()
EOFError: Ran out of input
>>> 

The file is not empty:

>>> os.stat(filename).st_size
31110059

Note: all data in the dictionary was comprised of python built-in types.

Skerry answered 11/3, 2018 at 17:17 Comment(0)
S
10

The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:

import io
import pickle

# Use the pure-Python version, we can't see the internal state of the C version
pickle.Unpickler = pickle._Unpickler

import dill

if __name__ == '__main__':
    obj = [1, 2, {3: 4, "5": ('6',)}]
    data = dill.dumps(obj)

    handle = io.BytesIO(data[:-5])  # cut it off

    unpickler = dill.Unpickler(handle)

    try:
        unpickler.load()
    except EOFError:
        pass

    print(unpickler.stack)

I get the following output:

[3, 4, '5', ('6',)]

The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.

Sandeesandeep answered 11/3, 2018 at 17:46 Comment(2)
I'm the dill author. Indeed, pickling is just dumping to a string, so you should be able to recover up to the last object dumped when it failed. pickle and thus dill pickles recursively, so be warned that the "last object" means "the last object that was the target of a dump".Jaimiejain
With the newest version of dill it failed with: AttributeError: 'Unpickler' object has no attribute 'stack'. How can this be achieved in new dill versions? which version did you use?Hargrave
N
0

I can't comment on the above answer, but to extend Blender's answer:

unpickler.metastack worked for me, dill v0.3.5.1 (though you could do it without dill, afaik). stack did exist, but was an empty list.

Also, with dill I got a UnpicklingError rather than EOFError. This could also be partly because of how my file got corrupted (ran out of disk space)

Northampton answered 17/8, 2022 at 22:44 Comment(1)
Thanks for the tip about metastack. I wrote some example code (without dill) https://mcmap.net/q/1772692/-how-to-retrieve-data-from-a-corrupt-pandas_pickle_file-pkl-when-pandas-read_pickle-throws-quot-eoferror-ran-out-of-input-quotLillielilliputian

© 2022 - 2024 — McMap. All rights reserved.