Why when I transfer a file through SFTP, it takes longer than FTP?
Asked Answered
T

10

69

I manually copy a file to a server, and the same one to an SFTP server. The file is 140MB.

FTP: I have a rate arround 11MB/s

SFTP: I have a rate arround 4.5MB/s

I understand the file has to be encrypted before being sent. Is it the only impact on the file transfer? (and actually this is not exactly transfer time, but encryption time).

I am suprised of such results.

Thymol answered 13/1, 2012 at 10:41 Comment(4)
p.s. for transfer speed unit, do you mean Mb/s ?Bandurria
Mo is french for MB, “un octet” is a byte. Octo being eight in latin.Kauslick
SFTP will almost always be significantly slower than FTP or FTPS (usually by several orders of magnitude). The reason for the difference is that there is a lot of additional packet, encryption and handshaking overhead inherent in the SSH2 protocol that FTP doesn't have to worry about. FTP is a very lean and comparatively simple protocol with almost no data transfer overhead, and the protocol was specifically designed for transferring files quickly. Encryption will slow FTP down, but not nearly to the level of SFTP.Hundredweight
For related questions, see Speed up SFTP uploads on high latency network? and Why is FileZilla SFTP file transfer max capped at 1.3MiB/sec instead of saturating available bandwidth?Radioelement
L
192

I'm the author of HPN-SSH and I was asked by a commenter here to weigh in. I'd like to start with a couple of background items. First off, it's important to keep in mind that SSHv2 is a multiplexed protocol - multiple channels over a single TCP connection. As such, the SSH channels are essentially unaware of the underlying flow control algorithm used by TCP. This means that SSHv2 has to implement its own flow control algorithm. The most common implementation basically reimplements sliding windows. The means that you have the SSH sliding window riding on top of the TCP sliding window. The end results is that the effective size of the receive buffer is the minimum of the receive buffers of the two sliding windows. Stock OpenSSH has a maximum receive buffer size of 2MB but this really ends up being closer to ~1.2MB. Most modern OSes have a buffer that can grow (using auto-tuning receive buffers) up to an effective size of 4MB. Why does this matter? If the receive buffer size is less than the bandwidth delay product (BDP) then you will never be able to fully fill the pipe regardless of how fast your system is.

This is complicated by the fact that SFTP adds another layer of flow control onto of the TCP and SSH flow controls. SFTP uses a concept of outstanding messages. Each message may be a command, a result of a command, or bulk data flow. The outstanding messages may be up to a specific datagram size. So you end up with what you might as well think of as yet another receive buffer. The size of this receive buffer is datagram size * maximum outstanding messages (both of which may be set on the command line). The default is 32k * 64 (2MB). So when using SFTP you have to make sure that the TCP receive buffer, the SSH receive buffer, and the SFTP receive buffer are all of sufficient size (without being too large or you can have over buffering problems in interactive sessions).

HPN-SSH directly addresses the SSH buffer problem by having a maximum buffer size of around 16MB. More importantly, the buffer dynamically grows to the proper size by polling the proc entry for the TCP connection's buffer size (basically poking a hole between layers 3 and 4). This avoids overbuffering in almost all situations. In SFTP we raise the maximum number of outstanding requests to 256. At least we should be doing that - it looks like that change didn't propagate as expected to the 6.3 patch set (though it is in 6.2. I'll fix that soon). There isn't a 6.4 version because 6.3 patches cleanly against 6.4 (which is a 1 line security fix from 6.3). You can get the patch set from sourceforge.

I know this sounds odd but right sizing the buffers was the single most important change in terms of performance. In spite of what many people think the encryption is not the real source of poor performance in most cases. You can prove this to yourself by transferring data to sources that are increasingly far away (in terms of RTT). You'll notice that the longer the RTT the lower the throughput. That clearly indicates that this is an RTT dependent performance problem.

Anyway, with this change I started seeing improvements of up to 2 orders of magnitude. If you understand TCP you'll understand why this made such a difference. It's not about the size of the datagram or the number of packets or anything like that. It's entire because in order to make efficient use of the network path you must have a receive buffer equal to the amount of data that can be in transit between the two hosts. This also means that you may not see any improvement whatsoever if the path isn't sufficiently fast and long enough. If the BDP is less than 1.2MB HPN-SSH may be of no value to you.

The parallelized AES-CTR cipher is a performance boost on systems with multiple cores if you need to have full encryption end to end. Usually I suggest people (or have control over both the server and client) to use the NONE cipher switch (encrypted authentication, bulk data passed in clear) as most data isn't all that sensitive. However, this only works in non-interactive sessions like SCP. It doesn't work in SFTP.

There are some other performance improvements as well but nothing as important as the right sizing of the buffers and the encryption work. When I get some free time I'll probably pipeline the HMAC process (currently the biggest drag on performance) and do some more minor optimization work.

So if HPN-SSH is so awesome why hasn't OpenSSH adopted it? That's a long story and people who know the OpenBSD team probably already know the answer. I understand many of their reasons - it's a big patch which would require additional work on their end (and they are a small team), they don't care as much about performance as security (though there is no security implications to HPN-SSH), etc etc etc. However, even though OpenSSH doesn't use HPN-SSH Facebook does. So do Google, Yahoo, Apple, most ever large research data center, NASA, NOAA, the government, the military, and most financial institutions. It's pretty well vetted at this point.

If anyone has any questions feel free to ask but I may not be keeping up to date on this forum. You can always send me mail via the HPN-SSH email address (google it).

Lyudmila answered 13/1, 2014 at 17:3 Comment(5)
Ugh, SSH is still really f*ing slow. This bug is seriously impacting the use of encryption for bulk data transfer. Have you pinged the OpenSSH team recently to see if there is any interest?Aculeate
Any way to apply this patch to Win32-OpenSSH or cygwin?Shoemaker
Honestly, I don't know about Win32-OpenSSH. I do know that if you have a cygwin environment running you can patch the base openssh code with my patch set. Alternatively, if you are running Windows 10 you can use the bash shell (Bash on Ubuntu on Windows) and patch and compile OpenSSH with the HPN extensions there.Lyudmila
@ChrisRapier: Have a question if you don't mind :) could you elaborate on the compatibility issues, if any? Would an HPN-SSH server work correctly with an SSH client? What about vice-versa? And what would the performance be in each case (original, or improved)?Cornice
It is fully compatible with OpenSSH in all scenarios that I've tested (which are a lot :) The performance advantage generally comes when HPN-SSH is the data receiver as the heart of the changes deal with receiver side flow control. Please keep in mind that this performance boost only happens when the BDP of the path is larger than the 2MB limit in stock OpenSSH.Lyudmila
A
18

UPDATE: As a commenter pointed out, the problem I outline below was fixed some time before this post. However, I knew of the HP-SSH project and I asked the author to weigh in. As they explain in the (rightfully) most upvoted answer, encryption is not the source of the problem. Yay for email and people smarter than myself!

Wow, a year-old question with nothing but incorrect answers. However, I must admit that I assumed the slowdown was due to encryption when I asked myself the same question. But ask yourself the next logical question: how quickly can your computer encrypt and decrypt data? If you think that rate is anywhere near the 4.5Mb/second reported by the OP (.5625MBs or roughly half the capacity of a 5.5" floppy disk!) smack yourself a few times, drink some coffee, and ask yourself the same question again.

It apparently has to do with what amounts to be an oversight in the packet size selection, or at least that's what the author of LIBSSH2 says,

The nature of SFTP and its ACK for every small data chunk it sends, makes an initial naive SFTP implementation suffer badly when sending data over high latency networks. If you have to wait a few hundred milliseconds for each 32KB of data then there will never be fast SFTP transfers. This sort of naive implementation is what libssh2 has offered up until and including libssh2 1.2.7.

So the speed hit is due to tiny packet sizes x mandatory ack responses for each packet, which is clearly insane.

The High Performance SSH/SCP (HP-SSH) project provides an OpenSSH patch set which apparently improves the internal buffers as well as parallelizing encryption. Note, however, that even the non-parallelized versions ran at speeds above the 40Mb/s unencrypted speeds obtained by some commenters. The fix involves changing the way in which OpenSSH was calling the encryption libraries, NOT the cipher and there is zero difference in speed between AES128 and AES256. Encryption takes some time, but it is marginal. It might have mattered back in the 90's but (like the speed of Java vs C) it just doesn't matter anymore.

Aculeate answered 25/11, 2013 at 22:46 Comment(5)
Your statements are nothing but incompetent, sorry. Encryption does have limits that impact transfers on the fast systems. What you quoted was known for years and implemented in all libraries (including OpenSSH) since around '2007. HP-SSH is a different story. Now if you compare FTP and SFTP on the same computer and network with the optimal code (the one that doesn't introduce slowness due to bad design or implementation errors), SFTP will always be slower than FTP.Clougher
Oh, well, the dates on the papers are quite old! The HP-SSH benchmarks appear to line up with the numbers everyone else is reporting, (although the OP appears to be sloppy with his/her mb/MB notation). If these are solved problems then why does HP-SSH still exist then? Could you maybe suggest an edit?Aculeate
Also, that LIBSSH post dates from 2010 so ... FWIW I've emailed the HP-SSH people and asked them to weigh in.Aculeate
What HP-SSH does is making encryption faster by performing it in parallel. This is a tricky thing that does increase speed significantly.Clougher
HP-SSH parallelizes encryption in addition to tuning the network connection. If you look at their parallelization benchmarks, it shows that they were reaching 400 (AES 256) to 500 mb/s (AES 128) using an 8-core CPU from 2008. Even if we correct for the OP's mb/MB error, that's an order of magnitude slower than the raw output s/he should be getting. Although, if s/he is connected to a shared host....Aculeate
T
11

Several factors affect speed of SFTP transfer:

  1. Encryption. Though symmetric encryption is fast, it's not that fast to be unnoticed. If you comparing speeds on fast network (100mbit or larger), encryption becomes a break for your process.
  2. Hash calculation and checking.
  3. Buffer copying. SFTP running on top of SSH causes each data block to be copied at least 6 times (3 times on each side) more comparing to plain FTP where data in best cases can be passed to network interface without being copied at all. And block copy takes a bit of time as well.
Thorianite answered 13/1, 2012 at 14:3 Comment(0)
G
3

For those still finding this question and looking for an answer that does not require patching OpenSSH. I am the author of an open-source GPL project called Push SFTP that is available on GitHub. It's a command-line client, similar to the standard SFTP command, that adds a push command that uploads files using parallelism.

It does not require a patched version of OpenSSH, and testing has shown an average x2.5 increase in throughput when using the push mechanism. This method works with standard OpenSSH servers on all major distributions. It relies on random access support, so there are circumstances where it does not work, for example, where the SFTP server has custom file systems pointing to cloud-based storage like S3.

In addition, as the developer of the open-source Java SSH library that backs the client, Maverick Synergy, we have built support into the API for other developers to utilise in their projects.

Galibi answered 8/7, 2023 at 11:8 Comment(0)
T
2

Encryption has not only cpu, but also some network overhead.

Temikatemp answered 13/1, 2012 at 10:45 Comment(9)
Additional information: If you enable compression over SSH, it may be faster than FTP, if SCP is used in the behind scene.Bandurria
@ShivanRaptor, isn't it usually enabled by default? And is sftp so much different than scp?Temikatemp
Yes. Compression is usually enabled by default. SFTP usually refers to FTP over SSH in many SFTP clients, which relies on setting FTP in SSH tunnel & its speed is slower than using scp commands.Bandurria
It is. see the wikipedia page. @ShivanRaptor SFTP isn't FTP over SSH.Kauslick
@greut, I didn't exactly mean the difference in capabilities, I meant in terms of overhead during the file transfer itself.Temikatemp
Compression won't always be faster though. If you have a high-speed network (eg: LAN), the CPU overhead will actually make transfers slower (and even considerably slower).Aeroembolism
@Hugo, especially if you have some low-end nas on compressing end ;-)Temikatemp
@MichaelKrelin-hacker Although that is quite true (and most certainly my case), generally, on gigabit network, you won't achieve any gains by compressing, and may even have lower throughtput (as I've tested with two AESNI-capable laptops).Aeroembolism
@Hugo, I don't deny that. Most of the time for sure, though may depend on compressibility of data and storage speed.Temikatemp
K
1

SFTP is not FTP over SSH, it's a different protocol and being similar to SCP, it's offers more capabilities.

Kauslick answered 13/1, 2012 at 10:49 Comment(0)
C
1

Your results make sense. Since FTP operates over a non-encrypted channel it is faster than SFTP (which is subsystem on top of the SSH version 2 protocol). Also remember that SFTP is a packet based protocol unlike FTP which is command based.

Each packet in SFTP is encrypted before being written to the outgoing socket from the client and subsequently decrypted when received by the server. This of-course leads to slow transfer rates but very secure transfer. Using compression such as zlib with SFTP improves the transfer time but still it won't be anywhere near plain text FTP. Perhaps a better comparison is to compare SFTP with FTPS which both use encryption?

Speed for SFTP depends on the cipher used for encryption/decryption, the compression used e.g. zlib, packet sizes and buffer sizes used for the socket connection.

Churn answered 19/4, 2013 at 23:46 Comment(1)
Why was this downvoted? Parts of it are accurate, correct, and more relevant than other answers with a non-negative score.Olivann
N
0

There are all sorts of things which can do this. One possiblity is "Traffic Shaping". This is commonly done in office environments to reserve bandwidth for business critical activities. It may also be done by the web hosting company, or by your ISP, for very similar reasons.

You can also set it up at home very simply.

For example there may be a rule reserving minimum bandwidth for FTP, while SFTP might be falling under an "everything else" rule. Or there might be a rule capping bandwidth for SFTP, but someone else is also using SFTP at the same time as you.

So: Where are you tranferring the file from and to?

Natty answered 13/1, 2012 at 10:48 Comment(0)
T
0

For comparison, I tried transfering a 299GB ntfs disk image from an i5 laptop running Raring Ringtail Ubuntu alpha 2 live cd to an i7 desktop running Ubuntu 12.04.1. Reported speeds:

over wifi + powerline: scp: 5MB/sec (40 Mbit/sec)

over gigabit ethernet + netgear G5608 v3:

scp: 44MB/sec

sftp: 47MB/sec

sftp -C: 13MB/sec

So, over a good gigabit link, sftp is slightly faster than scp, 2010-era fast CPUs seem fast enough to encrypt, but compression isn't a win in all cases.

Over a bad gigabit ethernet link, though, I've had sftp far outperform scp. Something about scp being very chatty, see "scp UNBELIEVABLY slow" on comp.security.ssh from 2008: https://groups.google.com/forum/?fromgroups=#!topic/comp.security.ssh/ldPV3msFFQw http://fixunix.com/ssh/368694-scp-unbelievably-slow.html

Tull answered 27/2, 2013 at 17:7 Comment(0)
F
-3

Yes, encryption add some load to your cpu, but if your cpu is not ancient that should not affect as much as you say.

If you enable compression for SSH, SCP is actually faster than FTP despite the SSH encryption (if I remember, twice as fast as FTP for the files I tried). I haven't actually used SFTP, but I believe it uses SCP for the actual file transfer. So please try this and let us know :-)

Fungus answered 13/1, 2012 at 10:48 Comment(3)
SCP is a downlevel predecessor to SFTP.Natty
Networks nowadays are gigabit (and 10Gb as well). If your disks are fast enough, CPU is the limiting factor, regardless of how fast it is.Aeroembolism
Disks are never fast enough for the CPU and networks are an order of magnitude slower.Aculeate

© 2022 - 2024 — McMap. All rights reserved.