Python: Reading Ftp file list with UTF-8?
Asked Answered
A

4

11

Hi I am using module ftplib. And list my files with this code:

files=[]
files = ftp.nlst()

And write them to text file with this code:

for item in files:
    filenames.write(item +'\n')

But there is an encoding problem that if my file name has 'ı,ğ,ş' characters, It cant read this and writes to file with '?' instead.

How can read them properly?

Afrika answered 8/4, 2015 at 18:55 Comment(2)
could you copy the Exception it is rising ?Bespectacled
related: Confirm that Python 2.6 ftplib does not support Unicode file names? Alternatives?Lounge
P
15

Python 3.x is using default encoding ISO-8859-1 for file name.

To use UTF-8 encoding for file name with the server, you need to add the following line:

ftpConnector = ftplib.FTP(host,user,password) # connection

ftpConnector.encoding='utf-8' #force encoding for file name in utf-8 rather than default that is iso-8889-1

then you can use:

ftpConnector.storbinary( 'STOR '+fileName, file) # filename will be utf-8 encoded
Placidia answered 14/10, 2015 at 13:46 Comment(1)
This is a correct answer. Tested with python 3 and filezilla serverMarabout
S
1
ftp.encoding='utf-8'
ftp.sendcmd('OPTS UTF8 ON')
Soothsay answered 22/8, 2018 at 9:2 Comment(1)
While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.Dipole
E
0

You need to convert the resulting items back to unicode before writing the items to file. The ftp module does not support unicode strings. To do this try:

import encodings.idna

for item in files:
    filenames.write(encodings.idna.ToUnicode(item) + '\n')
Entablature answered 8/4, 2015 at 19:30 Comment(2)
Thank you for replying. I tried your solution but I got "<class 'UnicodeError'>" exception.Afrika
Can you give more details please? If you rerun, paste the entire error message here please.Entablature
D
0

Since Python 3.9 (bpo-39380) the default encoding of ftplib.FTP has been set to utf-8, and the FTP constructor offers an encoding parameter.

ftp = ftplib.FTP(..., encoding='utf-8')

However this change is probably an incompatible and bogus solution, broke a lot of existing code, and the nlst commands etc. since that break with UnicodeDecodeError as soon as random files on a server have names with do not do have proper utf-8 encoding. (You can't robustly read / inspect an unknown directory safely with the new default. And OPTS UTF8 ON FTP command has not been established and is usually a no-op on unix servers with 8bit file names - it effectively only allows 8bit transfer instead of 7bit ASCII for commands, which happens / happened as well without the command.).

The formerly fixed latin-1 encoding attribute of FTP, which is applied to the whole command traffic, not just filenames, was probably never meant to be a default encoding. No RFC or so ever recommended / preferred latin-1. But to enable an unrestricted 8-bit API for the 8-bit FTP protocol - though on PY3 with 8bit pseudo strings in the form of normal comfortable (unicode) strings using the lowest 256 chars.

For robust FTP its currently necessary to switch the internal encoding of ftplib.FTP back (in Py3.9+) to latin-1 / pseudo 8bit strings, do inspections maybe on the pseudo 8bit strings, and do the (optionally error-tolerant) encoding / decoding outside like this:

ftp.encoding = 'latin-1'   # or ftp = FTP(..., encoding='latin-1')
...
fn_ftp_utf8 = fn.encode('utf-8').decode('latin-1')
...
fn = fn_ftp_utf8.encode('latin-1').decode('utf-8', 'backslashreplace')

In future there may be / should be at least an errors parameter for ftplib.FTP in addition to the encoding parameter in order to allow simple use cases for the automatic encoding mode.

Downrange answered 23/3, 2022 at 16:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.