Python: ftplib hangs at end of transfer
Asked Answered
E

3

11

I been searching on this for a couple of days and havent found an answer yet.

I have trying to download video files from an FTP, my script checks the server, compares the nlist() to a list of already downloaded files parsed from a text file and then creates a new list of files to get and iterates over it downloading each file, disconnecting from the server and reconnecting for the next file (I thought server timeout might be an issue so I quit() the connection after each file download).

This works for the first few files but as soon as I hit a file that takes longer than 5 mins, fitlib just hangs at the end of the transfer (I can see in explorer that the file is the correct size so the download has completed but it doesnt seem to be getting the message and moving on to the next file)

any help will be greatly appreciated, my code is below:

newPath = "Z:\\pathto\\downloads\\"

for f in getFiles:
    print("Getting " + f)

for f in getFiles:

    fil = f.rstrip()
    ext = os.path.splitext(fil)[1]
    if ext in validExtensions:
        print("Downloading new file: " + fil)
        downloadFile(fil, newPath)

here is download.py

from ftplib import FTP
def downloadFile(filename, folder):
    myhost = 'host'
    myuser = 'user'
    passw = 'pass'
    #login
    ftp = FTP(myhost,myuser,passw)
    localfile = open(folder + filename, 'wb')
    ftp.retrbinary("RETR " + filename, localfile.write, 1024)
    print("Downloaded " + filename)
    localfile.close()
    ftp.quit()
Efficacious answered 30/10, 2013 at 20:1 Comment(0)
P
35

Without more information, I can't actually debug your problem, so I can only suggest the most general answer. This will probably not be necessary for you, but probably will be sufficient for anyone.

retrbinary will block until the entire file is done. If that's longer than 5 minutes, nothing will get sent over the control channel for the entire 5 minutes. Either your client is timing out the control channel, or the server is. So, when you try to hang up with ftp.quit(), it will either hang forever or raise an exception.

You can control your side's timeout with a timeout argument on the FTP constructor. Some servers support an IDLE command to allow you to set the server-side timeout. But, even if the appropriate one turns out to be doable, how do you pick an appropriate timeout in the first place?

What you really want to do is prevent the control socket from timing out while a transfer is happening on the data socket. But how? If you, e.g., ftp.voidcmd('NOOP') every so often in your callback function, that'll be enough to keep the connection alive… but it'll also force you to block until the server responds to the NOOP, which many servers will not do until the data transfer is complete, which means you'll just end up blocking forever (or until a different timeout) and not getting your data.

The standard techniques for handling two sockets without one blocking on the other are a multiplexer like select.select or threads. And you can do that here, but you will have to give up using the simple retrbinary interface and instead using transfercmd to get the data socket explicitly.

For example:

def downloadFile(…):
    ftp = FTP(…)
    sock = ftp.transfercmd('RETR ' + filename)
    def background():
        f = open(…)
        while True:
            block = sock.recv(1024*1024)
            if not block:
                break
            f.write(block)
        sock.close()
    t = threading.Thread(target=background)
    t.start()
    while t.is_alive():
        t.join(60)
        ftp.voidcmd('NOOP')

An alternative solution would be to read, say, 20MB at a time, then call ftp.abort(), and use the rest argument to resume the transfer with each new retrbinary until you reach the end of the file. However, ABOR could hang forever, just like that NOOP, so that doesn't guarantee anything—not to mention that servers don't have to respond to it.

What you could do is just close the whole connection down (not quit, but close). This is not very nice to the server, and may result in some wasted data being re-sent, and may also prevent TCP from doing its usual ramp up to full speed if you kill the sockets too quickly. But it should work.

See this answer—and notice that it requires a bit of testing against your particular broken server to figure out which, if any, variation works correctly and efficiently.

Paint answered 30/10, 2013 at 21:0 Comment(2)
If I could +1 this ten times I would. Most comprehensive answer Ive had to one of my questions.Efficacious
After dealing with this code in another question, I believe that while this code might workaround some problems for a single transfer per session, it is actually wrong. I've covered this in details in my answer.Skipton
W
1

Based on abarnet's solution (which was still hanging at the end) I've written this which finally works :-)

import ftplib
from tempfile import SpooledTemporaryFile

MEGABYTE = 1024 * 1024

def download(ftp_host, ftp_user, ftp_pass, ftp_path, filename):
    ftp = ftplib.FTP(ftp_host, ftp_user, ftp_pass, timeout=3600) # timeout: 1-hour
    ftp.cwd(ftp_path)

    filesize = ftp.size(filename) / MEGABYTE
    print(f"Downloading: {filename}   SIZE: {filesize:.1f} MB")

    with SpooledTemporaryFile(max_size=MEGABYTE, mode="w+b") as ff:
        sock = ftp.transfercmd('RETR ' + filename)
        while True:
            buff = sock.recv(MEGABYTE)
            if not buff: break
            ff.write(buff)
        sock.close()
        ff.rollover()  # force saving to HDD of the final chunk!!
        ff.seek(0)     # prepare for data reading
        print("Reading the buffer...")
        # alldata = ff.read()
        # upload_file_to_adls(filename, alldata, account_name, account_key, container, adls_path)
    ftp.quit()
Wasteland answered 19/11, 2021 at 15:6 Comment(1)
If I understand your code correct, the "fix" is that you are not waiting for response to RETR command (because the absence of voidresp call is the only difference to the official retrbinary code). But that's not a fix, that breaks the FTP protocol. The FTP session won't be usable after the download.Skipton
M
1

I do this, note that tf is an open filehandle that is passed in. I've redacted some stuff, but the general premise is to check how much data has been downloaded and abort the FTP when the downloaded amount matches the file size.

In my case, the issue has been that the transfer basically hangs once all the data has been downloaded - the server never closes the connection or whatever.


def download_file(filename, tf, size=None):
    def callback(data):
        tf.write(data)
        if size == tf.tell():
            raise FileCompleteException('Done!')

    with FTP(host='ftp.example.com',
             user='user',
             passwd='xxx') as ftp:
        try:
            ftp.retrbinary(f'RETR {filename}',
                           callback)
        except FileCompleteException:
            pass
Midrash answered 5/5, 2023 at 15:35 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.