Read a CSV file stored in a FTP in Python
Asked Answered
M

2

6

I have connected to a FTP and the connection is successful.

import ftplib
ftp = ftplib.FTP('***', '****','****')
listoffiles = ftp.dir()
print (listoffiles)

I have a few CSV files in this FTP and a few folders which contain some more CSV's.

I need to identify the list of folders in this location (home) and need to navigate into the folders. I think cwd command should work.

I also read the CSV stored in this FTP. How can I do that? Is there a way to directly load the CSV's here into Pandas?

Marlenamarlene answered 15/2, 2018 at 19:45 Comment(1)
Try the answer hereAttach
S
9

Based on the answer here (Python write create file directly in FTP) and my own knowledge about ftplib:

What you can do is the following:

from ftplib import FTP
import io, pandas

session = FTP('***', '****','****')

# get filenames on ftp home/root
remoteFilenames = session.nlst()
if ".." in remoteFilenames:
    remoteFilenames.remove("..")
if "." in remoteFilenames:
    remoteFilenames.remove(".")
# iterate over filenames and check which ones are folder
for filename in remoteFilenames:
    dirTest = session.nlst(filename)
    # This dir test does not work on certain servers
    if dirTest and len(dirTest) > 1:
        # its a directory => go to directory
        session.cwd(filename)
        # get filename for on ftp one level deeper
        remoteFilenames2 = session.nlst()
        if ".." in remoteFilenames2:
            remoteFilenames2.remove("..")
        if "." in remoteFilenames2:
            remoteFilenames2.remove(".")
        for filename in remoteFilenames2:
            # check again if the filename is a directory and this time ignore it in this case
            dirTest = session.nlst(filename)
            if dirTest and len(dirTest) > 1:
                continue

            # download the file but first create a virtual file object for it
            download_file = io.BytesIO()
            session.retrbinary("RETR {}".format(filename), download_file.write)
            download_file.seek(0) # after writing go back to the start of the virtual file
            pandas.read_csv(download_file) # read virtual file into pandas
            ##########
            # do your thing with pandas here
            ##########
            download_file.close() # close virtual file

session.quit() # close the ftp session

Alternatively if you know the structure of the ftpserver you could loop over a dictionary with the folder/file structure and download the files via ftplib or urllib like in the example:

for folder in {"folder1": ["file1", "file2"], "folder2": ["file1"]}:
    for file in folder:
        path = "/{}/{}".format(folder, file)
        ##########
        # specific ftp file download stuff here
        ##########
        ##########
        # do your thing with pandas here
        ##########

Both solution can be optimized by making them recursive or in general support more than one level of folders

Sub answered 15/2, 2018 at 22:8 Comment(1)
When I use the nlst() method, it returns only a list of files, no directories. Therefore, it doesn't drill down through the subdirs, and since my root directory is empty, remoteFilenames = session.nlst() returns an empty list.Iciness
R
-1

Better late than never... I was able to read directly into pandas. Not sure if this works for anyone.

import pandas as pd
from ftplib import FTP
ftp = FTP('ftp.[domain].com') # you need to put in your correct ftp domain
ftp.login() # i don't need login info for my ftp
ftp.cwd('[Directory]') # change directory to where the file is
df = pd.read_csv("[file.csv]", delimiter = "|", encoding='latin1') # i needed to specify delimiter and encoding
df.head()
Rufusrug answered 13/6, 2020 at 16:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.