I'm trying to use a python script to download files from a Chinese service provider (I'm not from China myself). The provider is giving me a .zip file which contains a file which seems to have Chinese characters in its name. This seems to be causing the zipfile module to barf.
Code:
import zipfile
f = "/path/to/zip_file.zip"
if zipfile.is_zipfile(f):
fz = zipfile.ZipFile(f, 'r')
The zipfile itself doesn't contain any non-ASCII characters but the file inside it does. When I run the above script i get the following exception:
Traceback (most recent call last): File "./temp.py", line 9, in <module>
fz = zipfile.ZipFile(f, 'r') File "/usr/lib/python2.7/zipfile.py", line 770, in __init__
self._RealGetContents() File "/usr/lib/python2.7/zipfile.py", line 859, in _RealGetContents
x.filename = x._decodeFilename() File "/usr/lib/python2.7/zipfile.py", line 379, in _decodeFilename
return self.filename.decode('utf-8') File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xbd in position 30: invalid start byte
I've tried looking through the answers to many similar questions:
- Read file with Chinese Characters
- Extract zip files with non-unicode filenames
- Extract files with invalid characters
Please correct me if I'm wrong, but it looks like an open issue with the zipfile module.
How do I get around this? Is there any alternative module for dealing with zipfiles that I should use? Or any other solution?
TIA.
Edit: I can access/unzip the same file perfectly with the linux command-line utility "unzip".
XXX
correspond to your language, check here for python 2.4 or here for python 3.x. – Decent