Is there an asynchronous version of sendfile in Linux?
Asked Answered
H

1

5

The io_getevents notification mechanism looks quite capable at first glance, so I would like something I could use with it. I just couldn't find anything yet. On Windows, it's easy: There is only TransmitFile, which can work asynchronously (overlapped) and with some notification mechanism (IOCP, event) if you want that. There must be some equivalent on Linux, right? Or, to put my question in some context, how would I create an efficient file server on Linux?

Hayne answered 30/12, 2017 at 11:5 Comment(7)
If the socket is non-blocking, than neither will sendfile (it will report how much data was scheduled to be sent in the socket's "buffer"). You will need to poll the socket to see when you can continue the sendfile operation (see epoll)... or better yet, use a library that does this for you.Simonnesimonpure
@Simonnesimonpure Mh, when I think of asynchronous I/O, I think of operations that I can start at arbitrary points in time and get notified when they finish. With epoll+sendfile, I first have to wait until send buffers are available, call sendfile which will copy some amount of data to said buffers (synchronously!), rinse and repeat.Hayne
Also, I read that sendfile might block even when used with non-blocking sockets, and that one can work around that using readahead: brad.livejournal.com/2228488.html This introduces even more complex application design and more latency because of the number of context switches needed before actually doing work. I don't really find the whole "non-blocking" approach satisfying.Hayne
it's true that sendfile isn't asio, but it does not copy the data synchronously to the socket's buffer (that's why it's important to set the socket to non-blocking)... actually, it doesn't even copy the data (which is part of it's optimization). From what I remember, the data is packaged directly from the file buffer.Simonnesimonpure
@Simonnesimonpure Ok, but then the sendfile documentation is quite misleading. It clearly says "If the transfer was successful, the number of bytes written to out_fd is returned." Also, nonblocking send/recv has to copy synchronously, there's just no other way. But that is not the point. For example, I would like to have multiple send operations in flight at once. I don't think this is possible using nonblocking sockets+epoll, right? With actual asynchronous I/O I could queue up some headers followed by actual file data. The OS could start sending my headers while prefetching the file data just in time.Hayne
I'm sorry, this is a bit of a long discussion for comments, and it might be too opinion based. At the end of the day the data needs to be packaged and handed off to the network layer. The question of who does it (your code, a system call or the OS scheduler) is mostly a question of minimizing operations and achieving maximum throughput. Also, as long as you have only one network card, concurrency is mostly out the window when the data reaches the wires... I find epoll more convenient and very performant, while allowing me more control.Simonnesimonpure
Yeah you might be right. I find epoll rather limiting, regarding both application design and performance. I find the kernel aio API more appealing but there does not seem to be support for sendfile yet.Hayne
E
6

Alas, there is nothing easy for you on Linux and nearly anything can block in the wrong circumstances (even io_submit). In answer to your questions (in the title and within the main text):

Them's the breaks...

Future (2020+) solutions

There's a suggestion that some future Linux kernel (later than 5.5 as were' already up to 5.5-rc7 at the time of writing) could essentially perform an asynchronous sendfile via io_uring if io_uring gains support for splice()...

Evenhanded answered 7/2, 2018 at 11:48 Comment(6)
The blog post you linked seems to suggest offloading blocking operations in userspace thread pools, which is the same thing that arvid did with libtorrent in 2012 for lack of better alternatives. This is exactly what I would like to avoid at all costs in favor of some more efficient APIs. I just hoped Linux would have improved things in the course of over 6 years. I mean if FreeBSD can get it right, why would Linux not be able to? Both are POSIX, so I feel like they share similar handicaps.Hayne
Also I would expect io_submit to only block under extreme circumstances, like the I/O request queue or the completion queue filling. I mean in that case, you really do have to wait for things to complete.Hayne
@Hayne Lack of a great pervasive async I/O framework is just a flaw in Linux - being popular doesn't mean you're always best in every category. POSIX never mandated an async framework so that's orthogonal to this. io_submit can block when you do buffered read which is not cached which is not that exotic or extreme.Evenhanded
Thanks for the reference on io_submit! I think that POSIX never mandating a proper async framework is one of the exact reasons that Linux also was not designed with asynchronous I/O in mind. That's why I called it a handicap.Hayne
@Hayne You're welcome. OK correction time - I should have spoken more carefully. POSIX does specify a set of AIO operations (see pubs.opengroup.org/onlinepubs/9699919799/basedefs/aio.h.html ) but a) it's optional b) on Linux it is glibc that implements those functions using a thread pool c) There's no sendfile in POSIX. What I should have said was that I was there's no POSIX defined AIO framework that works for most existing operations rather than defining a few new ones that are async.Evenhanded
Unfortunately I'm still not convinced by io_uring's "solution" of having a pipe in the middle and splicing data to it and from it. If I understand it correctly I would need to submit a splice to the pipe, wait for it to finish, submit another splice from the pipe to the receiving end and start over if the file was too big to be spliced completely in the first call. And I'd need to allocate and deallocate a pipe or have to maintain a pool of pipes. Seems strictly worse than Winsock's TransmitFile which is able to send up to 2 GiB in a single system call.Hayne

© 2022 - 2024 — McMap. All rights reserved.