Getting the latest files from FTP folder (filename having spaces) in Python
Asked Answered
W

2

7

I have a requirement where I have to pull the latest files from an FTP folder, the problem is that the filename is having spaces and the filename is having a specific pattern. Below is the code I have implemented:

import sys
from ftplib import FTP
import os
import socket
import time
import pandas as pd
import numpy as np
from glob import glob
import datetime as dt
from __future__ import with_statement

ftp = FTP('')
ftp.login('','')
ftp.cwd('')
ftp.retrlines('LIST')

filematch='*Elig.xlsx'
downloaded = []

for filename in ftp.nlst(filematch):
  fhandle=open(filename, 'wb')
  print 'Getting ' + filename
  ftp.retrbinary('RETR '+ filename, fhandle.write)
  fhandle.close()
  downloaded.append(filename)

ftp.quit()

I understand that I can append an empty list to ftp.dir() command, but since the filename is having spaces, I am unable to split it in the right way and pick the latest file of the type that I have mentined above.

Any help would be great.

Wrist answered 20/9, 2017 at 15:33 Comment(6)
What is the behavior of the posted program? Does it work correctly for you? Does it print an error message? Does is do something else entirely?Crist
It works fine to pull the files that I want and I did so for a one time process. But then going forward, I need to automate it and start picking only the latest files, based on date.Wrist
For future reference, giving us a example filename would be neat. Just so we know how it actually looks.Pinnatifid
ABC File 1 of 3_XXX_MV2_PElig.xlsx, here you go... but I guess the filename should not really be that important! Since the above code already had a file pattern that I had mentioned.Wrist
If you are communicating only with with one specific FTP server, is should be possible to parse the LIST output for timestamps despite spaces in filenames. Unless MDTM is available (R.Neumann's answer) I see no other way.Tamarin
The list output has the timestamp but then I want to iterate and bring the latest file out. I thought ftp.retrlines('LIST' -t *Elig.xlsx) would give me a way to put it in the right way but then it isn't helping.Wrist
A
5

You can get the file mtime by sending the MDTM command iff the FTP server supports it and sort the files on the FTP server accordingly.

def get_newest_files(ftp, limit=None):
    """Retrieves newest files from the FTP connection.

    :ftp: The FTP connection to use.
    :limit: Abort after yielding this amount of files.
    """

    files = []

    # Decorate files with mtime.
    for filename in ftp.nlst():
        response = ftp.sendcmd('MDTM {}'.format(filename))
        _, mtime = response.split()
        files.append((mtime, filename))

    # Sort files by mtime and break after limit is reached.
    for index, decorated_filename in enumerate(sorted(files, reverse=True)):
        if limit is not None and index >= limit:
            break

        _, filename = decorated_filename  # Undecorate
        yield filename


downloaded = []

# Retrieves the newest file from the FTP server.
for filename in get_newest_files(ftp, limit=1):
    print 'Getting ' + filename

    with open(filename, 'wb') as file:
        ftp.retrbinary('RETR '+ filename, file.write)

    downloaded.append(filename)
Arid answered 25/9, 2017 at 12:1 Comment(2)
I tried running this code but still pulls all the files from ftp of the corresponding type, and not the latest of them.Wrist
Thank you so much! This worked... I just had to add the argument to reverse the sorted(files), to pick up the latest file and also change the limit to 1 to pick up just the latest file. Once again, thank you for the help!Wrist
A
1

The issue is that the FTP "LIST" command returns text for humans, which format depends on the FTP server implementation.

Using PyFilesystem (in place of the standard ftplib) and its API will provide a "list" API (search "walk") that provide Pythonic structures of the file and directories lists hosted in the FTP server.

http://pyfilesystem2.readthedocs.io/en/latest/index.html

Ab answered 22/9, 2017 at 19:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.