Resources associated to an aio_context

The semantics of Linux's Asynchronous file IO (AIO) is well described in the man page of io_setup(2), io_submit(2) and io_getevents(2).

However, without diving in the block IO subsystem, the operational side of the implementation is a little less clear. An aio_context allocates a queue for sending back io_events to a specific client in user-space. But is there more to it ?

Let be a file read sequentially chunks by chunks. Can requests, especially in Direct IO (DIO), be collated ? What if requests for two files are interleaved into one aio_context ? What if requests for one file are sent to two different aio_contexts ?
How requests are prioritized and scheduled in the above cases, with one or multiple aio_contexts ?
Is it possible that requests from two aio_contexts get interleaved at some point ? (Occasioning more seek latencies than intended.)
Does the thread or the CPU calling io_submit influence how it is scheduled ? Is the NUMA node containing the target buffer taken into consideration ?

More broadly, to which hardware resources (NUMA nodes, CPU cores, physical drives, file-systems and files) aio_contexts should be assigned, and at which level of granularity ?

Maybe it doesn't really matter and aio_contexts are no more than an abstraction for user-space programs. I'm asking since I have observed a performance decrease when concurrently reading multiples files, each with it's own aio_context, compared to a manual Round-robin serialization of chunks requests into a single aio_context.

You can mix requests freely in a single context and I would do so. Otherwise you have to poll two separate contexts doubling the number of syscalls.
Requests to a context are passed to the kernels async IO VFS layer. Multiple files, multiple contexts, multiple processes or users doing the requests it all ends up in the same layer. The VFS layer then sends the requests to the relevant filesystems or block devices and all the usual collation and such happens naturally.
Requests to the same file to one or more context at the same time I think are undefined behavior if they overlap. They could be ordered one way or the other. The later request could be processed first for example. So you need to write your own synchronization if strict ordering is required. Same as one or more threads doing read/write calls in parallel.
Prioritization and scheduling will depend on the lower layers. Afaik block devices will reorder requests so they happen in increasing block numbers (elevator code) to minimize seek times on rotating disks.
Yes, requests from different contexts and normal read/write calls will get interleaved.
I think the requesting process and NUMA and such is completely ignored.

Note: When dealing with files make sure the filesystem supports the linux async IO hooks and you might need to use O_DIRECT on open() with all it's consequences. A way to simply test this I found is to make lots of requests to a file in one io_submit() call and then check if the all finish simultaneously. If the filesystem falls back to sync IO then everything submitted will finish at the same time.

Recommended topics

Hot tags