Read CSV/Excel files from SFTP file, make some changes in those files using Pandas, and save back
Asked Answered
T

2

2

I want to read some CSV/Excel files on a secure SFTP folder, make some changes (fixed changes in each file like remove column 2) in those files, upload them to a Postgre DB and also the upload them to a different SFTP path in Python

What's the best way to this?

I have made a connection to the SFTP using pysftp library and am reading the Excel:

import pysftp
import pandas as pd

myHostname = "*****"
myUsername = "****"
myPassword = "***8"
cnopts =pysftp.CnOpts()
cnopts.hostkeys = None  

sftp=pysftp.Connection(host=myHostname, username=myUsername, 
password=myPassword,cnopts=cnopts)
print ("Connection succesfully stablished ... ")
sftp.chdir('test/test')
#sftp.pwd
a=[]
for i in sftp.listdir_attr():
    with sftp.open(i.filename) as f:
        df=pd.read_csv(f)

How should I proceed with the upload to DB and making those changes to the CSV permanent?

Transpontine answered 15/5, 2020 at 10:38 Comment(0)
A
5

You have the download part done.

For the upload part, see How to Transfer Pandas DataFrame to .csv on SFTP using Paramiko Library in Python? – While it's for Paramiko, pysftp Connection.open method behaves identically to Paramiko SFTPClient.open, so the code is the same (though, you should not use pysftp).

Full code can be like:

with sftp.open("/remote/path/data.csv", "r+", bufsize=32768) as f:
    # Download CSV contents from SFTP to memory
    df = pd.read_csv(f)

    # Modify as you need (just an example)
    df.at[0, 'Name'] = 'changed'

    # Upload the in-memory data back to SFTP
    f.seek(0)
    df.to_csv(f, index=False)
    # Truncate the remote file in case the new version of the contents is smaller
    f.truncate(f.tell())

The above updates the same file. If you want to upload to a different file, use this:

# Download CSV contents from SFTP to memory
with sftp.open("/remote/path/source.csv", "r") as f:
    df = pd.read_csv(f)

# Modify as you need (just an example)
df.at[0, 'Name'] = 'changed'

# Upload the in-memory data back to SFTP
with sftp.open("/remote/path/target.csv", "w", bufsize=32768) as f:
    df.to_csv(f, index=False)

For the purpose of bufsize, see:
Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow


Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.

Abstriction answered 15/5, 2020 at 19:13 Comment(0)
R
0

That's several questions in one question :)

I would suggest go with that approach:

  1. Make a local copy of the file (not sure how big it is, no point to shuffle it around between your local machine and sftp server. You cna use get method
  2. Make operations on your data with pandas, then dump it back to csv with to_csv method
  3. load data to the postgree using either pandas.io or pure SQLAlchemy. Check the docs here
  4. Upload the file to the destination you want with put method
Rileyrilievo answered 15/5, 2020 at 10:57 Comment(2)
Hi Kraxi, I can't store the csv to local(not allowed as part of the process) , is there a way to write the edited df to the original csv in the sftp itselfTranspontine
@Transpontine Well if you download the file to the memory on your local computer it's basically already there, just not on your drive. But that's semantics :) Martin posted you the answer regarding how to do that in memory. You can then send it to the other sftp location and (also in memory) dump it to the database. Everything in context of sftp.open()Rileyrilievo

© 2022 - 2024 — McMap. All rights reserved.