This problem commonly occurs when switching from py2 to py3. In py2 plaintext
represented both a string and a byte array type, it was type flexible, able to swing both ways. In py3 plaintext
is only a string now, it is more definite, and the method outfile.write()
actually takes a byte array when outfile
is opened in binary mode, so an exception is raised. Change the input to plaintext.encode('utf-8')
to fix the problem. Read on if this bothers you.
In py2, the declaration for file.write made it seem like you passed in a string: file.write(str)
. Actually you were passing in a byte array, you should have been reading the declaration like this: file.write(bytes)
. If you read it like this the problem is simple, file.write(bytes)
needs a bytes type and in py3 to get bytes out of a str you convert it:
py3>> outfile.write(plaintext.encode('utf-8'))
Why did the py2 docs declare file.write
took a string? Well in py2 the declaration distinction didn't matter because:
py2>> str==bytes #str and bytes aliased a single hybrid class in py2
True
The str-bytes class of py2 has methods/constructors that make it behave like a string class in some ways and a byte array class in others. Convenient for file.write
isn't it?:
py2>> plaintext='my string literal'
py2>> type(plaintext)
str #is it a string or is it a byte array? it's both!
py2>> outfile.write(plaintext) #can use plaintext as a byte array
Why did py3 break this nice system? Well because in py2 basic string functions didn't work for the rest of the world. Measure the length of a word with a non-ASCII character?
py2>> len('¡no') #length of string=3, length of UTF-8 byte array=4, since with variable len encoding the non-ASCII chars = 2-6 bytes
4 #always gives bytes.len not str.len
All this time you thought you were asking for the len of a string in py2, you were getting the length of the byte array from the encoding. That ambiguity is the fundamental problem with double-duty classes. Which version of any method call do you implement?
The good news then is that py3 fixes this problem. It disentangles the str and bytes classes. The str class has string-like methods, the separate bytes class has byte array methods:
py3>> len('¡ok') #string
3
py3>> len('¡ok'.encode('utf-8')) #bytes
4
Hopefully knowing this helps de-mystify the issue, and makes the migration pain a little easier to bear.