TypeError: 'str' does not support the buffer interface
Asked Answered
V

7

272
plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wb") as outfile:
    outfile.write(plaintext) 

The above python code is giving me following error:

Traceback (most recent call last):
  File "C:/Users/Ankur Gupta/Desktop/Python_works/gzip_work1.py", line 33, in <module>
    compress_string()
  File "C:/Users/Ankur Gupta/Desktop/Python_works/gzip_work1.py", line 15, in compress_string
    outfile.write(plaintext)
  File "C:\Python32\lib\gzip.py", line 312, in write
    self.crc = zlib.crc32(data, self.crc) & 0xffffffff
TypeError: 'str' does not support the buffer interface
Vietcong answered 29/3, 2011 at 10:36 Comment(0)
H
300

If you use Python3x then string is not the same type as for Python 2.x, you must cast it to bytes (encode it).

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wb") as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))

Also do not use variable names like string or file while those are names of module or function.

EDIT @Tom

Yes, non-ASCII text is also compressed/decompressed. I use Polish letters with UTF-8 encoding:

plaintext = 'Polish text: ąćęłńóśźżĄĆĘŁŃÓŚŹŻ'
filename = 'foo.gz'
with gzip.open(filename, 'wb') as outfile:
    outfile.write(bytes(plaintext, 'UTF-8'))
with gzip.open(filename, 'r') as infile:
    outfile_content = infile.read().decode('UTF-8')
print(outfile_content)
Hegelianism answered 29/3, 2011 at 10:51 Comment(7)
It's odd that this fixed it; the original code worked for me under 3.1, and the sample code in the docs also does not encode explicitly. If you use it on non-ASCII text, does gunzip decompress it? I got an error.Audiovisual
I typed my Name in Unicode Hindi and it compressed it in gzip successfully. I am using Python 3.2Vietcong
@Tom Zych: Probably has something to do with the changes in 3.2: docs.python.org/dev/whatsnew/3.2.html#gzip-and-zipfileOrmazd
I tested it with ActiveState Python 3.1 and 3.2. On my machine it works in both.Middleton
I was trying to do this: gzip.write(input_file.read()). The equivalent to calling bytes(...) in this case is to open the input file in binary mode: input_file = open('filename', 'rb')Mazonson
For file compression you should always open the input in binary mode: You need to be able to uncompress the file later and get exactly the same content. Converting to Unicode (str) and back is unnecessary, and risks decoding errors or mismatches between input and output.Pleo
@Pleo In this case it's about the gzip.open() method, not the Python open() function. I'm sure the gzip part opens the file in binary mode, the question is what type for the uncompressed text it presents to the rest of the program.Adallard
B
97

There is an easier solution to this problem.

You just need to add a t to the mode so it becomes wt. This causes Python to open the file as a text file and not binary. Then everything will just work.

The complete program becomes this:

plaintext = input("Please enter the text you want to compress")
filename = input("Please enter the desired filename")
with gzip.open(filename + ".gz", "wt") as outfile:
    outfile.write(plaintext)
Birth answered 21/7, 2014 at 8:7 Comment(3)
Does it work on python2 too? Could it be a way to make the code work on python2 and python3?Last
Wow, man you are good! Thanks! Let me vote you up. This should be the accepted answer :))Hemeralopia
Adding "t" can have side-effects. On windows files encoded as text will have newlines ("\n") converted to CRLF ("\r\n").Dover
S
43

You can not serialize a Python 3 'string' to bytes without explict conversion to some encoding.

outfile.write(plaintext.encode('utf-8'))

is possibly what you want. Also this works for both python 2.x and 3.x.

Sharrisharron answered 29/3, 2011 at 10:44 Comment(0)
O
28

For Python 3.x you can convert your text to raw bytes through:

bytes("my data", "encoding")

For example:

bytes("attack at dawn", "utf-8")

The object returned will work with outfile.write.

Ormazd answered 29/3, 2011 at 10:45 Comment(0)
U
10

This problem commonly occurs when switching from py2 to py3. In py2 plaintext represented both a string and a byte array type, it was type flexible, able to swing both ways. In py3 plaintext is only a string now, it is more definite, and the method outfile.write() actually takes a byte array when outfile is opened in binary mode, so an exception is raised. Change the input to plaintext.encode('utf-8') to fix the problem. Read on if this bothers you.

In py2, the declaration for file.write made it seem like you passed in a string: file.write(str). Actually you were passing in a byte array, you should have been reading the declaration like this: file.write(bytes). If you read it like this the problem is simple, file.write(bytes) needs a bytes type and in py3 to get bytes out of a str you convert it:

py3>> outfile.write(plaintext.encode('utf-8'))

Why did the py2 docs declare file.write took a string? Well in py2 the declaration distinction didn't matter because:

py2>> str==bytes         #str and bytes aliased a single hybrid class in py2
True

The str-bytes class of py2 has methods/constructors that make it behave like a string class in some ways and a byte array class in others. Convenient for file.write isn't it?:

py2>> plaintext='my string literal'
py2>> type(plaintext)
str                              #is it a string or is it a byte array? it's both!

py2>> outfile.write(plaintext)   #can use plaintext as a byte array

Why did py3 break this nice system? Well because in py2 basic string functions didn't work for the rest of the world. Measure the length of a word with a non-ASCII character?

py2>> len('¡no')        #length of string=3, length of UTF-8 byte array=4, since with variable len encoding the non-ASCII chars = 2-6 bytes
4                       #always gives bytes.len not str.len

All this time you thought you were asking for the len of a string in py2, you were getting the length of the byte array from the encoding. That ambiguity is the fundamental problem with double-duty classes. Which version of any method call do you implement?

The good news then is that py3 fixes this problem. It disentangles the str and bytes classes. The str class has string-like methods, the separate bytes class has byte array methods:

py3>> len('¡ok')       #string
3
py3>> len('¡ok'.encode('utf-8'))     #bytes
4

Hopefully knowing this helps de-mystify the issue, and makes the migration pain a little easier to bear.

Use answered 9/1, 2016 at 0:43 Comment(0)
P
4
>>> s = bytes("s","utf-8")
>>> print(s)
b's'
>>> s = s.decode("utf-8")
>>> print(s)
s

Well if useful for you in case removing annoying 'b' character.If anyone got better idea please suggest me or feel free to edit me anytime in here.I'm just newbie

Prognostication answered 8/7, 2015 at 8:26 Comment(1)
You can also use s.encode('utf-8') it so pythonic as s.decode('utf-8') in replacement of s = bytes("s", "utf-8")Outpoint
R
4

For Django in django.test.TestCase unit testing, I changed my Python2 syntax:

def test_view(self):
    response = self.client.get(reverse('myview'))
    self.assertIn(str(self.obj.id), response.content)
    ...

To use the Python3 .decode('utf8') syntax:

def test_view(self):
    response = self.client.get(reverse('myview'))
    self.assertIn(str(self.obj.id), response.content.decode('utf8'))
    ...
Rust answered 17/8, 2015 at 17:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.