I have a piece of Python code that works, but is very slow to write a Dataframe directly to an SFTP location. I am using pysftp and pandas.to_csv()
to achieve the task of reading an Excel file from a remote location, run a few simple transformations and write it over to an SFTP location.
The code snippet is shared below which, takes 4 minutes 30 seconds precisely, to write 100 records to the SFTP location. An average Dataframe that I process has a maximum of 20 columns.
def dataframe_sftp_transfer(df,destination_path):
cnopts = CnOpts()
cnopts.hostkeys = None
sftp = Connection('sftp3.server.com'
,username= 'user'
,password = 'pwd123'
,cnopts=cnopts)
with sftp.open(destination_path,'w+') as f:
chunksize = 100
with tqdm(total=len(df)) as progbar:
df.to_csv(f,sep='~',index=False,chunksize=chunksize)
progbar.update(chunksize)
Is there a better/faster way to achieve the aforesaid? Shouldn't writing files of the stated magnitude take only a couple of minutes?
Using a tool like FileZilla to put files in the remote SFTP location works much faster but, that sadly takes away any form of automation.