Python: File download using ftplib hangs forever after file is successfully downloaded
Asked Answered
L

1

3

I have been trying to troubleshoot an issue where in when we are downloading a file from ftp/ftps. File gets downloaded successfully but no operation is performed post file download completion. No error has occurred which could give more information about the issue. I tried searching for this on stackoverflow and found this link which talks about similar problem statement and looks like I am facing similar issue, though I am not sure. Need little more help in resolving the issue.

I tried setting the FTP connection timeout to 60mins but of less help. Prior to this I was using retrbinary() of the ftplib but same issue occurs there. I tried passing different blocksize and windowsize but with that also issue was reproducible.

I am trying to download the file of size ~3GB from AWS EMR cluster. Sample code is written below.

    def download_ftp(self, ip, port, user_name, password, file_name, target_path):
    try:
        os.chdir(target_path)
        ftp = FTP(host=ip)
        ftp.connect(port=int(port), timeout=3000)
        ftp.login(user=user_name, passwd=password)

        if ftp.nlst(file_name) != []:
            dir = os.path.split(file_name)
            ftp.cwd(dir[0])
            for filename in ftp.nlst(file_name):
                sock = ftp.transfercmd('RETR ' + filename)

                def background():
                    fhandle = open(filename, 'wb')
                    while True:
                        block = sock.recv(1024 * 1024)
                        if not block:
                            break
                        fhandle.write(block)
                    sock.close()

                t = threading.Thread(target=background)
                t.start()
                while t.is_alive():
                    t.join(60)
                    ftp.voidcmd('NOOP')
                logger.info("File " + filename + " fetched successfully")
            return True
        else:
            logger.error("File " + file_name + " is not present in FTP")

    except Exception, e:
        logger.error(e)
        raise

Another option suggested in the above mentioned link is to close the connection post downloading small chunk of the file and then restart the connection. Can someone suggest how can this be achieved, not sure how to resume the download from the same point where the file download was stopped last time before closing the connection. Will this method be full proof of downloading the entire file.

I don't know much about FTP server level timeout settings so didn't know what and how it needs to be altered. I basically want to write a generic FTP down-loader which can help in downloading the files from FTP/FTPS.

When I use retrbinary() method of ftplib and set debug level to 2.

ftp.set_debuglevel(2)
ftp.retrbinary('RETR ' + filename, fhandle.write)

Below logs are getting printed.

cmd 'TYPE I' put 'TYPE I\r\n' get '200 Type set to I.\r\n' resp '200 Type set to I.' cmd 'PASV' put 'PASV\r\n' get '227 Entering Passive Mode (64,27,160,28,133,251).\r\n' resp '227 Entering Passive Mode (64,27,160,28,133,251).' cmd 'RETR FFFT_BRA_PM_R_201711.txt' put 'RETR FFFT_BRA_PM_R_201711.txt\r\n' get '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.\r\n' resp '150 Opening BINARY mode data connection for FFFT_BRA_PM_R_201711.txt.'

Lamonicalamont answered 23/4, 2018 at 8:1 Comment(10)
How long did you try waiting for a file download to complete? Can you download the same file using any FTP client running on the same machine as your Python code?Beaujolais
yes i am able to download the file using FTP client. File download gets completed but it never does anything post that.Lamonicalamont
Sorry your comment is ambiguous. By your second sentence ("File download gets completed"), are you referring to FTP client or your Python code? Show us a log file of the FTP client.Beaujolais
My bad! I actually misinterpreted and gave very cryptic response. When i tried downloading the file from ftp client even this gets in hang status after file download. Though If i use expect and set timeout in shell script I am able to download this fileLamonicalamont
OK, so your question has nothing to do with Python or ftplib. So it's off-topic on Stack Overflow. Consider moving it to Super User.Beaujolais
but its not getting downloaded in python using ftplib as client is getting timed out as mentioned in link attached with this case. I want to know how shall I bypass this issue? Any help in this regard would be much appreciated. ThanksLamonicalamont
If you want to HACK this by enforcing timeout on the download, say that explicitly in your question (and mention also that the problem is not Python specific, but general). And be prepared to receive downvotes. - If you want to SOLVE this by fixing your FTP connection to the server, move your question to Super User.Beaujolais
hey @MartinPrikryl - As suggested I have asked this question to Super User. My intent here was not to hack but to find a proper solution. If there is any setting available in Ftplib which takes care of this, based on limited understanding and after doing a bit of search I found that issue might be due to FTP connection getting timed out, though socket is successfully able to download the file.Lamonicalamont
Posting the exact same question on Super User is just going to get it migrated here and closed. If you want to know how to fix your FTP connection in general, you probably write a more general question there—ideally one that shows using a standard FTP client instead of your own code, then mentions what you tried in Python (with a link to this question) to demonstrate that it's not client-specific.Cooperate
Meanwhile, if you want a hacky solution built around resuming partial files, stay on SO, but edit this question to be specific to what you want. (This will only work if your server allows resume, of course, but we should be able to test for that and fail intelligently if it doesn't work.)Cooperate
C
3

Before doing anything, note that there is something very wrong with your connection, and diagnosing that and getting it fixed is far better than working around it. But sometimes, you just have to deal with a broken server, and even sending keepalives doesn't help. So, what can you do?

The trick is to download a chunk at a time, then abort the download—or, if the server can't handle aborting, close and reopen the connection.

Note that I'm testing everything below with ftp://speedtest.tele2.net/5MB.zip, which hopefully this doesn't cause a million people to start hammering their servers. Of course you'll want to test it with your actual server.

Testing for REST

The entire solution of course relies on the server being able to resume transfers, which not all servers can do—especially when you're dealing with something badly broken. So we'll need to test for that. Note that this test will be very slow, and very heavy on the server, so do not testing with your 3GB file; find something much smaller. Also, if you can put something readable there, it will help for debugging, because you may be stuck comparing files in a hex editor.

def downit():
    with open('5MB.zip', 'wb') as f:
        while True:
            ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='[email protected]')
            pos = f.tell()
            print(pos)
            ftp.sendcmd('TYPE I')
            sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
            buf = sock.recv(1024 * 1024)
            if not buf:
                return
            f.write(buf)

You will probably not get 1MB at a time, but instead something under 8KB. Let's assume you're seeing 1448, then 2896, 4344, etc.

  • If you get an exception from the REST, the server does not handle resuming—give up, you're hosed.
  • If the file goes on past the actual file size, hit ^C, and check it in a hex editor.
    • If you see the same 1448 bytes or whatever (the amount you saw it printing out) over and over again, again, you're hosed.
    • If you have the right data, but with extra bytes between each chunk of 1448 bytes, that's actually fixable. If you run into this and can't figure out how to fix it by using f.seek, I can explain—but you probably won't run into it.

Testing for ABRT

One thing we can do is try to abort the download and not reconnect.

def downit():
    with open('5MB.zip', 'wb') as f:
        ftp = FTP(host='speedtest.tele2.net', user='anonymous', passwd='[email protected]')
        while True:
            pos = f.tell()
            print(pos)
            ftp.sendcmd('TYPE I')
            sock = ftp.transfercmd('RETR 5MB.zip', rest=pos)
            buf = sock.recv(1024 * 1024)
            if not buf:
                return
            f.write(buf)
            sock.close()
            ftp.abort()

You're going to want to try multiple variations:

  • No sock.close.
  • No ftp.abort.
  • With sock.close after ftp.abort.
  • With ftp.abort after sock.close.
  • All four of the above repeated with TYPE I moved to before the loop instead of each time.

Some will raise exceptions. Others will just appear to hang forever. If that's true for all 8 of them, we need to give up on aborting. But if any of them works, great!

Downloading a full chunk

The other way to speed things up is to download 1MB (or more) at a time before aborting or reconnecting. Just replace this code:

buf = sock.recv(1024 * 1024)
if buf:
    f.write(buf)

with this:

chunklen = 1024 * 1024
while chunklen:
    print('   ', f.tell())
    buf = sock.recv(chunklen)
    if not buf:
        break
    f.write(buf)
    chunklen -= len(buf)

Now, instead of reading 1442 or 8192 bytes for each transfer, you're reading up to 1MB for each transfer. Try pushing it farther.

Combining with keepalives

If, say, your downloads were failing at 10MB, and the keepalive code in your question got things up to 512MB, but it just wasn't enough for 3GB—you can combine the two. Use keepalives to read 512MB at a time, then abort or reconnect and read the next 512MB, until you're done.

Cooperate answered 23/4, 2018 at 19:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.