Unpickling a python 2 object with python 3
Asked Answered
B

2

145

I'm wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.

I've been running 2to3 on a large amount of company legacy code to get it up to date.

Having done this, when running the file I get the following error:

  File "H:\fixers - 3.4\addressfixer - 3.4\trunk\lib\address\address_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

Looking at the pickled object in contention, it's a dict in a dict, containing keys and values of type str.

So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?

Bravery answered 29/1, 2015 at 15:32 Comment(4)
Does Python 2.4 have the json module? Perhaps you could write a 2.4 script that unpickles the object and saves it as a json object, and then write a 3.4 script that reads the json object and saves it as a 3.4-compatible pickle object. This would be a one-time operation that you run on all your pickle files.Watkin
I was thinking along similar lines, considering that these are dicts I reckon I could just change sys.stdout to a file and print them out, but I want to see if I can load them firstBravery
Related question having to do with datetimes specifically: #24805605Sethsethi
Even when the question was asked, Python 2.4 was over 10 years old and the last patch release was over 6 years old.Nierman
G
210

You'll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.

The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1') 

but you'll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.

The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.

Gasiform answered 29/1, 2015 at 15:38 Comment(5)
How could this be made backward compatible with Python 2? Apparently, encoding argument isn't present for Python 2.Verret
@EpicAdv: you don't need to make this code compatible with Python 2; this question is about how to load Python 2 pickles into Python 3. Drop the encoding keyword altogether for Python 2.Gasiform
@EpicAdv: You can create a pickle_options dictionary that is either empty for python 2 or has 'encoding': 'latin1' and send **pickle_options to pickle. This way it should run in both versions.Neomineomycin
@Neomineomycin - Clever, but somewhere you have to detect which version you're using, so you could also more straightforwardly just do the call differently (one with and one without the extra argument) depending on the version. But at least you got the gist of EpicAdv's comment, which Martijn's comment doesn't address at all.Sethsethi
I realize the datetime comment was not the main thrust of this answer, but for future readers, I'd like to point out that even the "fixed" versions of Python 3 still require encoding='latin-1' to unpickle Python 2 datetimes. If your pickled Python 2 data happens to include both datetimes and bytestrings encoded in something other than Latin-1, then you might still be better off using encoding='bytes' after all.Sethsethi
S
19

Using encoding='latin1' causes some issues when your object contains numpy arrays in it.

Using encoding='bytes' will be better.

Please see this answer for complete explanation of using encoding='bytes'

Safeguard answered 19/12, 2017 at 8:39 Comment(6)
Which issues? What should I be careful of? using bytes makes strings into bytes(), so I prefer latin1 if possible, but it is not clear to me what the problem is.Bluish
@sreeragh-a-r: Could you give an example of the issues you encountered? I have a two-dimensional numpy.ndarray (numpy 1.14) pickled in Python 2.7 using cPickle.dumps(), and unpickling in Python 3 with pickle.loads(..., encoding='latin1') works fine.Jacobs
@Jacobs I faced issues when I had to pickle images as image string and unpickle them. The code can be found here. gist.github.com/sreeragh-ar/70205db3a43badbfa69f758faa898be3Safeguard
@Bluish Please see the above gist for the problem. Images were getting corrupted after unpickling.Safeguard
if you are NOT using np.arrays save yourself some hassle and keep encoding='latin1' so you don't have to decode all bytes to strEyeless
@Eyeless or if your strings is only ascii characters, otherwise you'll have to use bytes.Kitchenware

© 2022 - 2024 — McMap. All rights reserved.