How to maximize http.sys file upload performance
Asked Answered
W

2

15

I'm building a tool that transfers very large streaming data sets (possibly on the order of terabytes in a single stream; routinely in the tens of gigabytes) from one server to another. The client portion of the tool will read blocks from the source disk, and send them over the network. The server side will read these blocks off the network and write them to a file on the server disk.

Right now I'm trying to decide which transport to use. Options are raw TCP, and HTTP.

I really, REALLY want to be able to use HTTP. The HttpListener (or WCF if I want to go that route) make it easy to plug in to the HTTP Server API (http.sys), and I can get things like authentication and SSL for free. The problem right now is performance.

I wrote a simple test harness that sends 128K blocks of NULL bytes using the BeginWrite/EndWrite async I/O idiom, with async BeginRead/EndRead on the server side. I've modified this test harness so I can do this with either HTTP PUT operations via HttpWebRequest/HttpListener, or plain old socket writes using TcpClient/TcpListener. To rule out issues with network cards or network pathways, both the client and server are on one machine and communicate over localhost.

On my 12-core Windows 2008 R2 test server, the TCP version of this test harness can push bytes at 450MB/s, with minimal CPU usage. On the same box, the HTTP version of the test harness runs between 130MB/s and 200MB/s depending upon how I tweak it.

In both cases CPU usage is low, and the vast majority of what CPU usage there is is kernel time, so I'm pretty sure my usage of C# and the .NET runtime is not the bottleneck. The box has two 6-core Xeon X5650 processors, 24GB of single-ranked DDR3 RAM, and is used exclusively by me for my own performance testing.

I already know about HTTP client tweaks like ServicePointManager.MaxServicePointIdleTime, ServicePointManager.DefaultConnectionLimit, ServicePointManager.Expect100Continue, and HttpWebRequest.AllowWriteStreamBuffering.

Does anyone have any ideas for how I can get HTTP.sys performance beyond 200MB/s? Has anyone seen it perform this well on any environment?

UPDATE:

Here's a bit more detail on the performance I'm seeing with TcpListener vs HttpListener:

First, I wrote a TcpClient/TcpListener test. On my test box that was able to push 450MB/s.

Then using reflector I figured out how to get the raw Socket object underlying HttpWebRequest, and modified my HTTP client test to use that. Still no joy; barely 200MB/s.

My current theory is that http.sys is optimized for the typical IIS use case, which is lots of concurrent small requests, and lots of concurrent and possibly large responses. I hypothesize that in order to achieve this optimization, MSFT had to do so at the expense of what I'm trying to accomplish, which is very high throughput on a single very large request, with a very small response.

For what it's worth, I also tried up to 32 concurrent HTTP PUT operations to see if it could scale out, but there was still no joy; about 200MB/s.

Interestingly, on my development workstation, which is a quad-core Xeon Precision T7400 running 64-bit Windows 7, my TcpClient implementation is about 200MB/s, and the HTTP version is also about 200MB/s. Once I take it to a higher-end server-class machine running Server 2008 R2, the TcpClient code gets up to 450MB/s, while the HTTP.sys code stays around 200.

At this point I've sadly concluded that HTTP.sys is not the right tool for the job I need done, and will have to continue to use the hand-rolled socket protocol we've been using all along.

Wynellwynn answered 5/5, 2010 at 23:10 Comment(4)
Nice, well researched question. As a user of HttpListener, I will be watching this one. +1 (after putting aside my sentiments on hearing about a developer with a mightier dev platform than my own)Operator
Thanks :) Don't turn green with envy just yet; I time-share this box with two other devs (it happens to be my turn on it right now but later I have to turn it over to someone else). As it is that box ate up the budget that was supposed to be used to replace my aging development laptop. Oh well; Accounting giveth, Accounting taketh away.Wynellwynn
not exactly an answer, but I wonder how a utility like Unison would handle this alliance.seas.upenn.edu/~bcpierce/wiki/…Ignatz
In my experience sync tools like Unison and rsync simply don't perform the way I need, at least not under Windows. That's not to say sync technology cannot be made to perform; just that I've not yet seen an implementation that can saturate a decent RAID 0 SAS array.Wynellwynn
O
2

I can't see too much of interest except for this Tech Note. It might be worth having a fiddle with MaxBytesPerSend

Operator answered 5/5, 2010 at 23:29 Comment(1)
Yeah, I'd actually seen that. The reason I didn't meddle with MaxBytesPerSend was that in my case, HTTP.sys isn't doing the sending; I'm doing an HTTP PUT with a large payload, which HTTP.sys receives and streams to my HttpLisener-based server. Also, a larger TCP window is supposed to benefit links with high bandwidth combined with high latency; since my test is being done over the loopback interface, latency is asymptotic to zero. Thanks for the suggestion though. Keep 'em coming!Wynellwynn
C
0

If you're going to send files over the LAN then UDP is the way to go, because TCP's overhead is a waste in that case. TCP provides rate limiting to avoid too many lost packets, whereas with UDP the application has to sort that out by itself. NFS would do the job, were it not that you're stuck with windows; but I'm sure there must be ready made UDP stuff. Also use the tool "iperf" (available on linux, probably also windows) to benchmark the network link irrespective of the protocol. Some network cards are plain crap and rely on the CPU too much, which will limit your speed to 200mbit. You want a proper network card with its own processors (don't know the exact terms to put this).

Corbitt answered 12/5, 2010 at 10:13 Comment(2)
Thanks for the suggestion, but for various reasons I'm stuck with HTTP. I'm pretty familiar with the overhead introduced by TCP, but that alone does not account for the performance penalty I'm paying here. Using raw TCP I was able to move 450MB/s, compared to barely over 200MB/s with HTTP.sys, so clearly there's additional overhead specific to the HTTP implementation. I'm using an PCI-Express x4 Intel PRO/1000 dual-port GigE NIC with TCP offload, but that's not even relevant since the test I'm running is over the loopback interface.Wynellwynn
I find it strange that you would be stuck with HTTP, something like TFTP would be much better. How about experimenting with some command line HTTP implementations, eg. curl plus some trivial HTTP server in C? From what you're saying it seems obvious that it's a software limitation, so try some other implementations. You probably want a server that mmaps the files?Corbitt

© 2022 - 2024 — McMap. All rights reserved.