Writing to a file on SFTP server opened using Paramiko/pysftp "open" method is slow
Asked Answered
P

1

3

I have a piece of Python code that works, but is very slow to write a Dataframe directly to an SFTP location. I am using pysftp and pandas.to_csv() to achieve the task of reading an Excel file from a remote location, run a few simple transformations and write it over to an SFTP location.

The code snippet is shared below which, takes 4 minutes 30 seconds precisely, to write 100 records to the SFTP location. An average Dataframe that I process has a maximum of 20 columns.

def dataframe_sftp_transfer(df,destination_path):
    cnopts = CnOpts()
    cnopts.hostkeys = None
    sftp = Connection('sftp3.server.com'
                    ,username= 'user'
                    ,password = 'pwd123'
                    ,cnopts=cnopts)
    with sftp.open(destination_path,'w+') as f:
        chunksize = 100
        with tqdm(total=len(df)) as progbar:
            df.to_csv(f,sep='~',index=False,chunksize=chunksize)
            progbar.update(chunksize)

Is there a better/faster way to achieve the aforesaid? Shouldn't writing files of the stated magnitude take only a couple of minutes?

Using a tool like FileZilla to put files in the remote SFTP location works much faster but, that sadly takes away any form of automation.

Parsaye answered 26/9, 2019 at 7:39 Comment(0)
B
4

You open the remote file without buffering. That way, every time the df.to_csv writes to the file, Paramiko/pysftp sends a request to the SFTP server and waits for a response. I do not know internals of df.to_csv, but it's likely it does one write per line (if not more). That would explain, why the upload is so slow. Particularly, if your connection to the server has high latency.

To enable buffered writes, use bufsize parameter of Connection.open:

with sftp.open(destination_path, 'w+', 32768) as f:

Similarly for reads/downloads:
Reading file opened with Python Paramiko SFTPClient.open method is slow


Obligatory warning: Do not set cnopts.hostkeys = None, unless you do not care about security. For the correct solution see Verify host key with pysftp.

Bayard answered 26/9, 2019 at 8:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.