Python-FTP download all files in directory
Asked Answered
I

6

36

I'm putting together a script to download all the files from a directory via FTP. So far I have managed to connect and fetch one file, but I cannot seem to make to work in batch (get all the files from the directory) Here is what I have so far:

from ftplib import FTP
import os, sys, os.path

def handleDownload(block):
    file.write(block)
    print ".",

ddir='C:\\Data\\test\\'
os.chdir(ddir)
ftp = FTP('test1/server/')

print 'Logging in.'
ftp.login('user1\\anon', 'pswrd20')
directory = '\\data\\test\\'

print 'Changing to ' + directory
ftp.cwd(directory)
ftp.retrlines('LIST')

print 'Accessing files'

for subdir, dirs, files in os.walk(directory):
    for file in files: 
        full_fname = os.path.join(root, fname);  
        print 'Opening local file ' 
        ftp.retrbinary('RETR C:\\Data\\test\\' + fname,
                       handleDownload,
                       open(full_fname, 'wb'));
        print 'Closing file ' + filename
        file.close();
ftp.close()

I bet you can tell that it does not do much when I run it, so any suggestions for improvements would be greatly appreciated.

Illjudged answered 8/3, 2011 at 10:7 Comment(0)
I
79

I've managed to crack this, so now posting the relevant bit of code for future visitors:

filenames = ftp.nlst() # get filenames within the directory
print filenames

for filename in filenames:
    local_filename = os.path.join('C:\\test\\', filename)
    file = open(local_filename, 'wb')
    ftp.retrbinary('RETR '+ filename, file.write)

    file.close()

ftp.quit() # This is the “polite” way to close a connection

This worked for me on Python 2.5, Windows XP.

Illjudged answered 15/3, 2011 at 15:28 Comment(3)
The recommended way is to use: ftp.quit() instead of ftp.close(). Please see this linkTael
How does ftp.nlst() know which link I want? This answer seems to be incomplete.Metalware
wont work if you have a directory name in the filenames listGravid
C
11

If this is just a problem you'd like to solve, I might suggest the wget command:

cd c:\destination
wget --mirror --continue --no-host-directories --user=username --password=s3cr3t ftp://hostname/source/path/

The --continue option could be very dangerous if files change on the server. If files are only ever added, then it is very friendly.

However, if this is a learning exercise for you and you'd like to make your program work, I think you should start by looking at this line:

for subdir, dirs, files in os.walk(directory):

directory has been the remote source directory in most of your program, but the os.walk() function cannot walk a remote directory. You need to iterate over the returned files yourself, using a callback supplied to the retrlines function.

Take a look at the MLSD or NLST options instead of LIST, they will probably be easier to parse. (Note that FTP doesn't actually specify how lists should look; it was always intended to be driven by a human at a console, or a specific filename transferred. So programs that do clever things with FTP listings like present them to the user in a GUI probably have to have huge piles of special case code, for odd or obscure servers. And they probably all do something stupid when faced with malicious file names.)

Can you use sftp instead? sftp does have a specification for how file listings are supposed to be parsed, doesn't transmit username/password in the clear, and doesn't have the giant annoyance of passive vs active connections -- it simply uses the single connection, which means it works across more firewalls than FTP does.

Edit: You need to pass a 'callable' object to the retrlines function. A callable object is either an instance of a class that defined a __call__ method, or a function. While the function might be easier to describe, an instance of a class may be more useful. (You could use the instance to collect the filenames, but the function would have to write to a global variable. Bad.)

Here's one of the simplest callable object:

>>> class c:
...  def __call__(self, *args):
...   print(args)
...
>>> f = c()
>>> f('hello')
('hello',)
>>> f('hello', 'world')
('hello', 'world')

This creates a new class, c, that defines an instance method __call__. This just prints its arguments in a fairly stupid manner, but it shows how minimal we're talking. :)

If you wanted something smarter, it could do something like this:

class handle_lines:
  def __init__(self):
    self.lines = []
  def __call__(self, *args):
    self.lines << args[0]

Call iterlines with an object of this class, then look in the object's lines member for details.

Commissariat answered 8/3, 2011 at 10:27 Comment(4)
@Sosti, the retrlines function mention in my post is a hyperlink off to the documentation :)Commissariat
Thanks a lot for that, they all sound like solid suggestions! I forgot to mention i m using Python 2.5 on Windows XP (if that s useful at all) If I use the MLSD option, 'ftp.retrlines('MLSD')', would the code work for the iteration or do I need to modify more? (sure it sounds a bit daft but newb here, remember?:DD)Illjudged
@Sosti, you would still need to modify your code: you can't use the os.walk() function. I'll edit my answer in a bit to show how to make a callback object for retrlines.Commissariat
I have to admit I need to do some research on this, and attempt to write some lines of code. I was hoping the problem could be fixed by tweaking some lines but apparently the issue is more fundamental. Will do my best and then come back with any results. Thanks for all the input and suggestions!Illjudged
U
3

this code is a little bit of overkill I think.

(from the python example https://docs.python.org/2/library/ftplib.html) After ftp.login() and setting ftp.cwd() you can just use:

os.chdir(ddir)
ls = ftp.nlst()
count = len(ls)
curr = 0
print "found {} files".format(count)
for fn in ls:
    curr += 1
    print 'Processing file {} ... {} of {} ...'.format(fn, curr, count)
    ftp.retrbinary('RETR ' + fn, open(fn, 'wb').write)

ftp.quit()
print "download complete."

to download all the files.

Upu answered 24/1, 2017 at 20:36 Comment(1)
Going a bit forward, is it possible to check hash before writing out? on the FTP server I try to download there's file1.gz and file1.gz.md5 and so on , the remote has more than 1200 files so it is not possible to download all and check (memory problem)Bihar
F
1

A recursive solution (py 2.7):

import os, ftplib, shutil, operator

def cloneFTP((addr, user, passw), remote, local):
    try:
        ftp = ftplib.FTP(addr)
        ftp.login(user, passw)
        ftp.cwd(remote)
    except: 
        try: ftp.quit()
        except: pass
        print 'Invalid input ftp data!'
        return False
    try: shutil.rmtree(local)
    except: pass
    try: os.makedirs(local)
    except: pass
    dirs = []
    for filename in ftp.nlst():
        try:
            ftp.size(filename)
            ftp.retrbinary('RETR '+ filename, open(os.path.join(local, filename), 'wb').write)
        except:
            dirs.append(filename)
    ftp.quit()
    res = map(lambda d: cloneFTP((addr, user, passw), os.path.join(remote, d), os.path.join(local, d)), dirs)
    return reduce(operator.iand, res, True)
Falcone answered 12/3, 2019 at 17:44 Comment(0)
S
0

I am a beginner so I have not made the code efficiently but I made it and tested it is working. This is what I did to download files and folders from ftp site but only limited depth in file structure.

try:
   a = input("Enter hostname : ")
   b = input("Enter username : ")
   c = input("Enter password : ")
   from ftplib import FTP
   import os
   os.makedirs("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp")
   os.chdir("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp")
   ftp = FTP(host = a, user= b, passwd = c)
   D = ftp.nlst()
   for d in D:
      l = len(d)
      char = False
      for i in range(0,l):
          char = char or d[i]=="."
      if not char:
         ftp.cwd("..")
         ftp.cwd("..")
         E = ftp.nlst("%s"%(d))
         ftp.cwd("%s"%(d))
         try:
             os.makedirs("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp\\%s"%(d))
         except:
             print("you can debug if you try some more")
         finally:
             os.chdir("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp\\%s"%(d))
             for e in E:
                l1 = len(e)
                char1 = False
                for i in range(0,l1):
                   char1 = char1 or e[i]=="."
                if not char1:
                   ftp.cwd("..")
                   ftp.cwd("..")
                   F = ftp.nlst("%s/%s"%(d,e))
                   ftp.cwd("%s/%s"%(d,e))
                   try:
                       os.makedirs("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp\\%s\\%s"%(d,e))
                   except:
                       print("you can debug if you try some more")
                   finally:
                       os.chdir("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp\\%s\\%s"%(d,e))
                       for f in F:
                           if "." in f[2:]:
                               with open(f,'wb') as filef:
                                   ftp.retrbinary('RETR %s' %(f), filef.write)
                           elif not "." in f:
                               try:
                                  os.makedirs("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp\\%s\\%s\\%s"%(d,e,f))
                               except:
                                  print("you can debug if you try some more")
                elif "." in e[2:]:
                   os.chdir("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp\\%s"%(d))
                   ftp.cwd("..")
                   ftp.cwd("..")
                   ftp.cwd("..")
                   ftp.cwd("%s"%(d))
                   with open(e,'wb') as filee:
                      ftp.retrbinary('RETR %s' %(e), filee.write)
      elif "." in d[2:]:
          ftp.cwd("..")
          ftp.cwd("..")
          os.chdir("C:\\Users\\PREM\\Desktop\\pyftp download\\ftp")
          with open(d,'wb') as filed:
             ftp.retrbinary('RETR %s'%(d), filed.write)
   ftp.close()
   print("Your files has been successfully downloaded and saved. Bye")
except:
    print("try again you can do it")
finally:
    print("code ran")
Stranger answered 20/3, 2016 at 18:4 Comment(1)
Could you explain how/why your code works? That'll enable the OP and others to understand and apply your methods (where applicable) elsewhere. Code-only answers are discouraged and liable to be deleted. — During reviewLahomalahore
S
-4

Instead of using Python lib to ftp download a directory, we can call a dos script from python program. In the dos script we would use the native ftp protocol which can download all file from the folder using mget *.*.

fetch.bat
ftp -s:fetch.txt

fetch.txt
open <ipaddress>
<userid>
<password>
bin (set the mnode to binary)
cd </desired directory>
mget *.*
bye

fetch.py
import os
os.system("fetch.bat")
Switzer answered 27/9, 2012 at 13:11 Comment(2)
it is also specific to windows (dos)Physostomous
Sometimes, it helps.Inaugural

© 2022 - 2024 — McMap. All rights reserved.