Do Linux system calls provide acquire/release semantics?
Asked Answered
L

0

7

For example, functions like futex_wake/futex_wait, epoll_ctl/epoll_wait, pthread_create provide acquire/release semantics. That's to say, I made some changes before calling futex_wake, and then the woken thread always sees the changes.

My question is

  1. does the read/write to eventfd offer acquire/release semantics?
  2. is there any documents about this? I have checked the man page and did not find the answer about 1.

see blow code as an example:

initial:
    int g_atomic_val = 0;
    int evfd = eventfd();
    int64_t w_cnt = 1, r_cnt;

thread1:
    /* set some data, then write eventfd */
    g_atomic_val = 1;
    write(evfd, &w_cnt, sizeof(w_cnt));

thread2:
    for (;;) {
        /* polling event fd */
        poll(evfd);
        read(evfd, &r_cnt, sizeof(r_cnt));
        /* Does g_atomic_val always equals to 1? */
        assert(g_atomic_val == 1);
    }
Lynnett answered 25/7, 2022 at 12:42 Comment(19)
Good question. In my opinion, it is likely that the system calls you mentioned contain memory barriers in their implementation, I can't imagine how they could work otherwise. Historically, unix processes communicating by shared memory and arbitration semaphores have worked for decades without any additional service like barriers etc. That said, it seems that such multi-processing issues are widely (and inexplicabily) ignored by unix/linux/posix manuals. So I'll join in you request (ad +1 for you).Backswept
I'd like to find somewhare a phrase like "all system calls are full syncrhonrization points". But I haven't found anything similar so far :-(Backswept
@GiuseppeGuerrini: That's made tricky by the fact that some "system calls" don't actually enter the kernel per se, thanks to the vdso. Some of the time system calls may simply read an unprivileged hardware timer register and do some arithmetic; that may or may not be synchronizing, depending on architectural details.Ready
@GiuseppeGuerrini: Other calls that are often thought of as "system calls" may actually be implemented within the library. For instance the glibc implementation of getpid(2) used to only invoke an actual system call the first time; it would cache the result and return it directly on subsequent calls (invalidating on fork() etc). So getpid() might only execute one load instruction, with no particular memory ordering, and then return.Ready
Modifying a memory and then notifying other threads of changes to that same memory location doesn't strictly depend on acquire/release semantics. The kernel presumably does need acq/rel sync internally to communicate between cores via different memory locations. And presumably a thread woken from FUTEX_WAIT can see other changes to other memory locations as if it did an acquire load on the value. But see C++20: How is the returning from atomic::wait() guaranteed by the standard? - the modification order of a single variable exists even with relaxedPeg
Even the futex man pages say nothing about acquire, other than "acquiring a lock". I didn't see anything about memory order. So good question; I'd guess the documentation assumes that it syncs-with the notifier, and/or is heavy enough to basically be a full barrier, especially if you actually sleep.Peg
@Nate Eldredge: good point. I think we should concentrate on "potentially blocking" system calls, i.e. the ones that, in unix' tradition, may put the thread in "W" state, as in "sleep(1)". There is plenty of them. I don't know if there is an official list of services that may block threads by contract; it would be nice if it were one. In my opinion, thread suspension would be enough for full synchronization. The problem comes when the system call does not block the thread (e.g. a "read(fd,...)" when there are pending data): optimization paths could easily break MP synchronization...Backswept
...if the kernel didn't contain explicit barriers in that case.Backswept
The loading of the futex word's value, the comparison of that value with the expected value, and the actual blocking will happen atomically and will be totally ordered with respect to concurrent operations performed by other threads on the same futex word - from the man page of futexLynnett
@GiuseppeGuerrini: Well, sem_post can't block but we really, really hope that it contains a release barrier.Ready
@Nate Eldredge: you are right! So we should at least take into account "potentially blocking system calls and their signalling counterparts". Almost the whole set of services...Backswept
I don't think blocking is really of the essence at all. I'd say the rule is probably something more like "any system call whose effects can be observed by other processes will be release; any system call that observes the actions of other processes will be acquire". Another example is that, when acting on a regular file, neither read nor write can block, but a write/read pair is expected to be usable for synchronization. Or sem_post versus sem_getvalue.Ready
@GiuseppeGuerrini That made me realize that an elegant read should be based on a futex and a shared buffer...Verbalize
Thinking a little more, I believe the implied semantics must actually be stronger: all externally visible operations are sequentially consistent. The POSIX definition of write says things like "Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes." [...]Ready
AFAIK this dates from before the time when formal memory ordering specifications were common, but it carries the implicit assumption that "occurs after" makes sense and is a total ordering. And "any means" is meant to include things like semaphores, writes/reads of other files, sending and receiving a signal, etc, so these must all be included in the sequentially consistent total order. Programs written in earlier times most likely expect sequential consistency, and so if Linux were to weaken it, it would violate its implicit API / ABI.Ready
The one exception would be reads and writes of shared memory, which follow the memory ordering rules of the architecture.Ready
@curiousguy: In fact it must be done that way, because read is also required to immediately observe writes made via mmap, and vice versa. So an mmap is a mapping of that same shared buffer.Ready
@NateEldredge "In fact it must be done that way [for normal file objects that support all file operations like a file on disk], because read is also required to immediately observe writes made via mmap" I see but that wasn't what I was getting at. I was suggesting "that way" for character files that aren't normal files on a filesystem. Any byte stream like pipes, sockets, terminals... (usually non mmap-able stuff) could use the shared buffer+futex, IMHO.Verbalize
As of Linux kernel 6.6, eventfd_write() and eventfd_read() have implicit memory barriers and provide both acquire and release semantics. fs/eventfd.c: both read and write operations have critical sections guarded by the same spinlock. Among other things, this makes sure that all memory accesses made by a thread before it has written to eventfd are visible to the thread which has read the resp. value from that eventfd, visible after the read. But this is an implementation detail rather than a guarantee.Waynant

© 2022 - 2025 — McMap. All rights reserved.