Python pickle error: UnicodeDecodeError
Asked Answered
E

5

136

I'm trying to do some text classification using Textblob. I'm first training the model and serializing it using pickle as shown below.

import pickle
from textblob.classifiers import NaiveBayesClassifier

with open('sample.csv', 'r') as fp:
     cl = NaiveBayesClassifier(fp, format="csv")

f = open('sample_classifier.pickle', 'wb')
pickle.dump(cl, f)
f.close()

And when I try to run this file:

import pickle
f = open('sample_classifier.pickle', encoding="utf8")
cl = pickle.load(f)    
f.close()

I get this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Following are the content of my sample.csv:

My SQL is not working correctly at all. This was a wrong choice, SQL

I've issues. Please respond immediately, Support

Where am I going wrong here? Please help.

Encephalon answered 5/10, 2015 at 20:53 Comment(1)
Possible duplicate of Using pickle.dump - TypeError: must be str, not bytesBeaufert
N
241

By choosing to open the file in mode wb, you are choosing to write in raw binary. There is no character encoding being applied.

Thus to read this file, you should simply open in mode rb.

Nellie answered 5/10, 2015 at 21:3 Comment(3)
Is there a reason to use wb when saving the pickle? or is there a mode one can use to save the pickle which would not require opening it with rb mode?Disarrange
@Disarrange I use wb because some issue I've yet to fix prevents me from using w with pickle. It complains about writing bytes instead of strings.Signature
I still have the error even though I tried this solution.Callimachus
B
46

I think you should open the file as

f = open('sample_classifier.pickle', 'rb')
cl = pickle.load(f)   

You shouldn't have to decode it. pickle.load will give you an exact copy of whatever it is you saved. At this point you, should be able to work with cl as if you just created it.

Bobby answered 5/10, 2015 at 21:9 Comment(0)
D
1

maybe the file was encoded using latin1:

f = open('sample_classifier.pickle', encoding="latin1")
Drawstring answered 16/4, 2021 at 17:10 Comment(0)
L
0

since none of the suggested answers helped me with the error - i've switched to joblib instead:

import joblib
clf_loaded = joblib.load('classifier_file_name.joblib')

worked great !

Lubbi answered 17/4, 2021 at 4:46 Comment(0)
O
-3

try this code its working :

 with open('your picle file name', 'rb') as f:
      classifier = pickle.load(f, encoding="latin1")
  • Note : if not fixed you can try change (encoding) type to ("utf-8") if you use python2, but if you use python3.x encoding will be default ("utf-8") ....
Overslaugh answered 18/5, 2021 at 13:33 Comment(1)
The accepted answer for this question already indicates and explains the use of the 'rb' fs option. Setting an encoding may be useful for people dealing with legacy systems, but it's worth noting that py2 has been not just deprecated but sunset for ~18 months at time of answering.Carniola

© 2022 - 2024 — McMap. All rights reserved.