Why is the TUX Web Server Dead? Does Nginx/Lighttpd/Epoll/Kqueue Replace it?

Asked 15/11, 2013 at 0:25 Answered 8/12, 2013 at 22:54

I recall a very fast kernel module for Linux called "TUX" for static files to answer IIS's superior-to-Linux static file web-serving performance and solve the "C10K problem." Now I keep seeing:

Nginx
Lighttpd
CDNs

... for "fast static file-serving." Serving static files quickly isn't difficult if your OS has the right features. Windows has since the invention of IO Completion ports, overlapped I/O, etc.

Did Tux die because of the security implications? Was it an experiment that Kqueue/Epoll combined with features like Sendfile made obsolete? What is the best solution to serve 100% static content -- say packshots of 50 or so images to simulate a "flipbook" movie.

I understand this ia "Server-related" question, but it's also theoretical. If it's purely static, is a CDN really going to be better anyway?

Jungle answered 15/11, 2013 at 0:25 Comment(2)

I am no expert here but your last question seems weird to me.. isn't CDN simply multiple static-file servers with some load balancing? You just tell the client which server to connect to (by lowest latency/least busy server/nearest etc) and that's it. A single server will sooner or later fail to meet the demand but if you have several of them like in CDN, you simply distribute the load between them. – Inoculate 7/12, 2013 at 19:57

I see no problem with the question, though I did conflate the issue slightly. You don't need a userspace thread to send a file. The kernel can do it. The problem is that you do need a thread for application logic, so many sites just create a 2nd domain called images.example.com, and slap a CDN in front of it. It caches it, so the server doesn't see the request again for awhile. Presumably a CDN MUST use something uber-optimized like Tux (I doubt they use Nginx among other reasons, but because Nginx doesn't predate the proliferation of CDNs) to serve static files because it's all they do. – Jungle 9/12, 2013 at 6:34

Mostly because Ingo Molnár stopped working on it. Why? I believe it was because kernel version 2.2 implemented the sendfile(2) call which matched (approximately) the massive performance benefits previously achieved by Tux. Note the Tux 2.0 Reference Manual is dated 2001.

Chive answered 8/12, 2013 at 11:48 Comment(7)

I mentioned sendfile(). The problem is that you still have to have a thread to invoke sendfile(), so it can't be the entire story. I'm assuming that you really mean sendfile()+epoll/kqueue. Even then, it is heavier if all you're doing is serving static content without any application logic. This is exactly what Facebook and Myspace do to the ire of privacy advocates. If you know the filename, you can see anyone's photos. From their perspective, the filename IS the password, but you can see why a CDN might not want to introduce application logic anywhere. Sendfile() isn't the whole story. – Jungle 9/12, 2013 at 6:37

epoll wasn't introduced until 2.5.44, Tux was no longer being developed by then. kqueue was added to FreeBSD in 4.1 - and wouldn't affect a linux kernel server anyway. And we're talking about a linux kernel server that was never accepted into the mainline kernel. – Chive 9/12, 2013 at 6:43

I mention kqueue because it looks similar to epoll. You're correct that it's only FreeBSD. I should have left it out. sendfile() could never approach the speed of Tux because you'd need 1:1 thread per file, or a select/poll(), no? I don't see how that could even come remotely close to Tux. – Jungle 9/12, 2013 at 6:58

No. It's not 1:1 kernel:user thread; Did you notice then in the sendfile() man page -

sendfile()  copies  data  between  one  file  descriptor  and  another. Because this copying is done within  the  kernel...

- it's all in kernel space anyway. – Chive 9/12, 2013 at 19:38

Whether it copies memory or not has little to do with whether it blocks and how you are notified of completion. I don't know if it can or will return on a static file before completion, but it certainly could for certain devices or under load. Not understanding your answer. – Jungle 10/12, 2013 at 3:33

Because the file descriptor we're talking about there is the client socket and the file we're sending. E.g. it's in the kernel now. – Chive 10/12, 2013 at 3:36

Unknown. Hypothetically, the caller of sendfile() does not block - the kernel thread completes the request. Or, it does block (why bother then), and then it's 1:1. – Chive 11/12, 2013 at 8:44

serving a static file has three steps: decide which file to send, decide if to send the file, send the file. Now Tux did a really good job of sending the file, so so on deciding which file to send, and a lousy job of deciding if to send a file. The decisions are matter of policy and should be done in user space. Add sendfile and I can write a server that will be almost as good as tux in a short time, and add stuff without recompiling my kernel. maybe sql logging. just thinking about userspace sql calls being made from a kernel makes my eye twitch.

Sitra answered 8/12, 2013 at 16:6 Comment(14)

But what else will you use in addition to sendfile()? Obviously it would need to be non-blocking but also know when it's 'done.' I/O completion ports in Windows do this. In *nix is seems less clear cut. I like the idea of I/O completion ports because it's a pool of threads "just in case" it blocks, as well as employing zero-copy techniques and TransmitFile(). The other problem I see is what about SSL? You'd need to have the SSL in the kernel to efficiently transmit without copies and without a user space thread. – Jungle 9/12, 2013 at 6:53

Tux didn't have ssl either, but a kernel level encryption would allow for easier hardware acceleration (been dreaming about that since I herd about pgp phone). And as to blocking, that depends on your process model. – Sitra 9/12, 2013 at 7:3

Well since we're trying to emulate Tux here with something like Nginx, assume the process model means exactly as many threads as needed, which is probably around 1 -- since most of the work could be done by the OS. But it could be more to keep it optimal since there's a tiny bit of overhead before you hand it off to the kernel. Sendfile doesn't do everything here. A naive use of sendfile() would just spawn 1 thread per connection and call sendfile() and block. That would save memory (being zero-copy) but still create an enormous number of threads for the C10k experiment. – Jungle 9/12, 2013 at 7:9

And here's a very interesting counterpoint on why 1:1 might be OK. For the record, he has a point. If thread libraries get better and all the O(N) stuff goes away, who cares? usenix.org/legacy/events/hotos03/tech/full_papers/vonbehren/… – Jungle 9/12, 2013 at 7:17

sendfile and splice are documented as having differences between implementations, and in my brief reading may not have a non-blocking mode on linux. – Sitra 9/12, 2013 at 7:23

If they don't have a non-blocking mode then the deal's off for them. 1:1 is bad news for threading in every OS I know. That paper is 1 in a sea of other papers that all say the opposite -- that the programmer must maintain the state of an application in some arbitrary data structure instead of the more natural sequential process("process" used liberally -- I mean a series of steps/thread)+stack, and constantly look at that structure to know what to do next. Basically that really sucks, but it's what I think you must do in userspace for sendfile to work enough that it solves the C10k problem. – Jungle 9/12, 2013 at 7:29

Bsically, event based programming looks like en.wikipedia.org/wiki/Terminate_and_Stay_Resident to me with a little more management. It's devolving. Stack frames are really convenient. So are threads. But threads don't scale. Sendfile() applied to 1:1 just avoids copies. Tux avoids the threads. – Jungle 9/12, 2013 at 7:34

I never used Tux because every time that we were looking at a redesign, someone upstairs added a requirement, where not using userspace processes was not an option for some other reason, and having the process already there for our policy (often using legacy nonthreading libs) we used what we already had to have. Event loops are nothing like TSRs but they each have their beauty. – Sitra 9/12, 2013 at 7:50

I see event loops as similar in that the state is maintained by the programmer explicitly instead of via a stack and sequential program flow. In both a TSR and event-based programming, you have no idea what happened last except to the extent that you saved it somewhere. Am I wrong? – Jungle 9/12, 2013 at 8:8

Events are intentionally low state, TSRs are working around not having a full operating system. – Sitra 9/12, 2013 at 8:14

Agreed. I'm thinking of it more from a what-I-have-to-do-to-maintain-state POV. In that sense, they're similar, and more directly tied to events, and, in turn, interrupts/metal. I still don't see how constantly switching between even 1 userspace thread and the kernel with zero copies can come close to zero userspace threads serving static files. More specificially, if you were a CDN, would Nginx be good enough? Most CDNs predate Nginx, so what would they/have they designed? I'd guess it would look a lot like TUX. – Jungle 9/12, 2013 at 8:34

Curious, @Sitra Do you think OpenSSL will eventually end up in the kernel despite recent ... events ... to enable sendfile() && SSL to play nicely? SPDY would seem to require this and forms the basis of HTTP2. – Jungle 19/4, 2014 at 12:18

@JaimieSirovich I don't think OpenSSl would make it into the kernel, but I think a ssl implementation will, although not due to any well known bugs but due to the differences between user space and kernel process models. – Sitra 19/4, 2014 at 12:59

Thanks for confirming my suspicion. I brain farted and typed OpenSSL instead of just 'SSL.' I'm no expert on kernels but yes it would look very different. – Jungle 20/4, 2014 at 19:20

Tux isn't required anymore due to sendfile(). Nginx takes full advantage of this and IMO is one of the best web servers for static or non-static content. I've found memory problems with lighttpd, ymmv.

The whole purpose of CDN's is that it moves the 'web server' closer to the end users browser. This means less network hops and less round trip delay, without a large cost to you having to host multiple web servers around the world and using geo dns to send the user to the closest. Be aware tho as these web servers aren't in your control they could be overloaded and the benefit of less hops could be diminished if their network is overloaded. CDN's are usually targets of DDOS attacks and you might get caught up in something that has nothing to do with your content.

Naamana answered 8/12, 2013 at 22:54 Comment(2)

Sendfile() isn't the whole story. And that's not the (only) point of CDNs -- respectfully disagreeing. CDNs are also optimized for serving lots of static content, and not necessarily entirely a network thing. At least that's my experience. – Jungle 9/12, 2013 at 6:40

while open()+loop(sendfile())+close() reduces syscall overhead by roughly ~50% (on big files, and roughly 25% on small files) compared to open()+loop(read()+write())+close(), tux reduces syscall overhead by 100% (on both small AND big files)... so Tux still has an advantage over nginx. (both approaches reduce userland<->kernel memcpy by roughly 100% though) – Shawnee 10/6, 2020 at 19:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags