Python FTP get the most recent file by date
Asked Answered
I

5

19

I am using ftplib to connect to an ftp site. I want to get the most recently uploaded file and download it. I am able to connect to the ftp server and list the files, I also have put them in a list and got the datefield converted. Is there any function/module which can get the recent date and output the whole line from the list?

#!/usr/bin/env python

import ftplib
import os
import socket
import sys


HOST = 'test'


def main():
    try:
        f = ftplib.FTP(HOST)
    except (socket.error, socket.gaierror), e:
        print 'cannot reach to %s' % HOST
        return
    print "Connect to ftp server"

    try:
        f.login('anonymous','[email protected]')
    except ftplib.error_perm:
        print 'cannot login anonymously'
        f.quit()
        return
    print "logged on to the ftp server"

    data = []
    f.dir(data.append)
    for line in data:
        datestr = ' '.join(line.split()[0:2])
        orig-date = time.strptime(datestr, '%d-%m-%y %H:%M%p')


    f.quit()
    return


if __name__ == '__main__':
    main()

RESOLVED:

data = []
f.dir(data.append)
datelist = []
filelist = []
for line in data:
    col = line.split()
    datestr = ' '.join(line.split()[0:2])
    date = time.strptime(datestr, '%m-%d-%y %H:%M%p')
    datelist.append(date)
    filelist.append(col[3])

combo = zip(datelist,filelist)
who = dict(combo)

for key in sorted(who.iterkeys(), reverse=True):
   print "%s: %s" % (key,who[key])
   filename = who[key]
   print "file to download is %s" % filename
   try:
       f.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
   except ftplib.err_perm:
       print "Error: cannot read file %s" % filename
       os.unlink(filename)
   else:
       print "***Downloaded*** %s " % filename
   return

f.quit()
return

One problem, is it possible to retrieve the first element from the dictionary? what I did here is that the for loop runs only once and exits thereby giving me the first sorted value which is fine, but I don't think it is a good practice to do it in this way..

Interjection answered 24/1, 2012 at 16:42 Comment(0)
J
6

With NLST, like shown in Martin Prikryl's response, you should use sorted method:

ftp = FTP(host="127.0.0.1", user="u",passwd="p")
ftp.cwd("/data")
file_name = sorted(ftp.nlst(), key=lambda x: ftp.voidcmd(f"MDTM {x}"))[-1]
Jasperjaspers answered 27/1, 2020 at 17:9 Comment(1)
Nice short code. It's also quite reliable. But as my answer says, this is pretty inefficient, particularly if there are lot file files in the folder.Sjoberg
S
33

For those looking for a full solution for finding the latest file in a folder:

MLSD

If your FTP server supports MLSD command, a solution is easy:

entries = list(ftp.mlsd())
entries.sort(key = lambda entry: entry[1]['modify'], reverse = True)
latest_name = entries[0][0]
print(latest_name)

LIST

If you need to rely on an obsolete LIST command, you have to parse a proprietary listing it returns.

Common *nix listing is like:

-rw-r--r-- 1 user group           4467 Mar 27  2018 file1.zip
-rw-r--r-- 1 user group         124529 Jun 18 15:31 file2.zip

With a listing like this, this code will do:

from dateutil import parser

# ...

lines = []
ftp.dir("", lines.append)

latest_time = None
latest_name = None

for line in lines:
    tokens = line.split(maxsplit = 9)
    time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
    time = parser.parse(time_str)
    if (latest_time is None) or (time > latest_time):
        latest_name = tokens[8]
        latest_time = time

print(latest_name)

This is a rather fragile approach.


MDTM

A more reliable, but a way less efficient, is to use MDTM command to retrieve timestamps of individual files/folders:

names = ftp.nlst()

latest_time = None
latest_name = None

for name in names:
    time = ftp.voidcmd("MDTM " + name)
    if (latest_time is None) or (time > latest_time):
        latest_name = name
        latest_time = time

print(latest_name)

For an alternative version of the code, see the answer by @Paulo.


Non-standard -t switch

Some FTP servers support a proprietary non-standard -t switch for NLST (or LIST) command.

lines = ftp.nlst("-t")

latest_name = lines[-1]

See How to get files in FTP folder sorted by modification time.


Downloading found file

No matter what approach you use, once you have the latest_name, you download it as any other file:

with open(latest_name, 'wb') as f:
    ftp.retrbinary('RETR '+ latest_name, f.write)

See also

Sjoberg answered 27/6, 2018 at 6:31 Comment(0)
A
7

Why don't you use next dir option?

ftp.dir('-t',data.append)

With this option the file listing is time ordered from newest to oldest. Then just retrieve the first file in the list to download it.

Apologete answered 20/1, 2016 at 11:2 Comment(1)
Worth nothing, that -t switch is proprietary (and actually violates FTP specification). While it's rather widely supported, it's no way universal. See my answer for a link to details.Sjoberg
J
6

With NLST, like shown in Martin Prikryl's response, you should use sorted method:

ftp = FTP(host="127.0.0.1", user="u",passwd="p")
ftp.cwd("/data")
file_name = sorted(ftp.nlst(), key=lambda x: ftp.voidcmd(f"MDTM {x}"))[-1]
Jasperjaspers answered 27/1, 2020 at 17:9 Comment(1)
Nice short code. It's also quite reliable. But as my answer says, this is pretty inefficient, particularly if there are lot file files in the folder.Sjoberg
C
1

If you have all the dates in time.struct_time (strptime will give you this) in a list then all you have to do is sort the list.

Here's an example :

#!/usr/bin/python

import time

dates = [
    "Jan 16 18:35 2012",
    "Aug 16 21:14 2012",
    "Dec 05 22:27 2012",
    "Jan 22 19:42 2012",
    "Jan 24 00:49 2012",
    "Dec 15 22:41 2012",
    "Dec 13 01:41 2012",
    "Dec 24 01:23 2012",
    "Jan 21 00:35 2012",
    "Jan 16 18:35 2012",
]

def main():
    datelist = []
    for date in dates:
        date = time.strptime(date, '%b %d %H:%M %Y')
        datelist.append(date)

    print datelist
    datelist.sort()
    print datelist

if __name__ == '__main__':
    main()
Coverage answered 24/1, 2012 at 17:9 Comment(2)
i created another list, see the update, seems theres someproblem with sorting the dictionary..Interjection
I know that you are actually answering the OP's question. But in general, if OP is looking for one latest file only, it's an overkill and a waste of memory to build list(s) of dates in the first place. Just find the latest file already in the very first loop. Moreover there are other more efficient solutions. See my answer.Sjoberg
L
1

I don't know how it's your ftp, but your example was not working for me. I changed some lines related to the date sorting part:

    import sys
    from ftplib import FTP
    import os
    import socket
    import time

    # Connects to the ftp
    ftp = FTP(ftpHost)
    ftp.login(yourUserName,yourPassword)
    data = []
    datelist = []
    filelist = []
    ftp.dir(data.append)
    for line in data:
      col = line.split()
      datestr = ' '.join(line.split()[5:8])
      date = time.strptime(datestr, '%b %d %H:%M')
      datelist.append(date)
      filelist.append(col[8])
    combo = zip(datelist,filelist)
    who = dict(combo)
    for key in sorted(who.iterkeys(), reverse=True):
      print "%s: %s" % (key,who[key])
      filename = who[key]
      print "file to download is %s" % filename
      try:
        ftp.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
      except ftplib.err_perm:
        print "Error: cannot read file %s" % filename
        os.unlink(filename)
      else:
        print "***Downloaded*** %s " % filename
    ftp.quit()
Lindemann answered 25/3, 2015 at 16:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.