Understanding concurrent file writes from multiple processes
Asked Answered
F

3

20

From here : Is file append atomic in UNIX

Consider a case where multiple processes open the same file and append to it. O_APPEND guarantees that seeking to the end of file and then beginning the write operation is atomic. So multiple processes can append to the same file and no process will overwrite any other processes' write as far as each write size is <= PIPE_BUF.

I wrote a test program where multiple processes open and write to the same file (write(2)). I make sure each write size is > PIPE_BUF (4k). I was expecting to see instances where a process overwrites someone else's data. But that doesnt happen. I tested with different write sizes. Is that just luck or is there a reason why that doesn't happen? My ultimate goal is to understand if multiple processes appending to the same file need to co-ordinate their writes.

Here is the complete program. Every process creates an int buffer, fills all values with its rank, opens a file and writes to it.

Specs: OpenMPI 1.4.3 on Opensuse 11.3 64-bit

Compiled as: mpicc -O3 test.c, run as: mpirun -np 8 ./a.out

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>

int 
main(int argc, char** argv) {
    int rank, size, i, bufsize = 134217728, fd, status = 0, bytes_written, tmp_bytes_written;
    int* buf;
    char* filename = "/tmp/testfile.out";

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    buf = (int*) malloc (bufsize * sizeof(int));   
    if(buf == NULL) {
        status = -1;
        perror("Could not malloc");
        goto finalize;
    }
    for(i=0; i<bufsize; i++) 
        buf[i] = rank;

    if(-1 == (fd = open(filename, O_APPEND|O_WRONLY, S_IWUSR))) {
        perror("Cant open file");
        status = -1;
        goto end;
        exit(-1);
    }

    bytes_written = 0;
    if(bufsize != (tmp_bytes_written = write(fd, buf, bufsize))) {
        perror("Error during write");
        printf("ret value: %d\n", tmp_bytes_written);
        status = -1;
        goto close;
    }

close:
    if(-1 == close(fd)) {
        perror("Error during close");
        status = -1;
    }
end:
    free(buf);
finalize:
    MPI_Finalize();
    return status;
}
Fr answered 17/10, 2012 at 20:33 Comment(2)
perror( filename ); is far more useful than perror( "Cant open file");Dorton
I did the same test on a linux box(centos 7 3.10.0-327.13.1.el7.x86_64), i saw the behavior you want. Refer to #38220012.Cheryl
M
19

Atomicity of writes less than PIPE_BUF applies only to pipes and FIFOs. For file writes, POSIX says:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.

...which means that you're on your own - different UNIX-likes will give different guarantees.

Macpherson answered 17/10, 2012 at 21:10 Comment(8)
How do I know what my Linux guarantees? Is this documented somewhere for a particular Linux?Fr
@KVM: This rather amusing posting from Linus Torvalds implies that a single write() on an O_APPEND file should be atomic on Linux, at least on "UNIX-like filesystems" - the example given is log files. Note though that NFS for one certainly doesn't behave this way.Macpherson
So Linus's post does mean that a single write call is atomic. Quote: "The only, and the portable way to do this under UNIX is to create one single buffer, and write it out with one single write() call. Anything else is likely to cause the file to be interspersed by random log-fragments, instead of being a nice consecutive list of full log entries"Fr
@KVM: Yes, but Linus is talking about the traditional behaviour of UNIX and Linux rather than what's mandated by any written standard.Macpherson
@caf's link is broken. Looking for the citation of Korizon, I found this mailing list post from 2002 which has more context, but seems focused on writes within a single process under presence of signals.Bald
@Bald web.archive.org to the rescueFanaticism
An alternate archive link for the email I referred to earlier.Macpherson
Note that POSIX 7 write documentation states: "If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation." That sounds like a guarantee of atomicity. Note, however, that it's not a guarantee of completeness - write() is not guaranteed to write all the bytes requested, although IME partial writes to an actual file don't happen.Wadesworth
H
16

Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers, and that is by any amount, not just PIPE_BUF. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.

This says nothing about whether reads will ever see a torn write though, and on that POSIX says the following about atomicity of read() and write() to regular files:

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links ... [many functions] ... read() ... write() ... If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. [Source]

and

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. [Source]

but conversely:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control. [Source]

A safe interpretation of all three of these requirements would suggest that all writes overlapping an extent in the same file must be serialised with respect to one another and to reads such that torn writes never appear to readers.

A less safe, but still allowed interpretation could be that reads and writes only serialise with each other between threads inside the same process, and between processes writes are serialised with respect to reads only (i.e. there is sequentially consistent i/o ordering between threads in a process, but between processes i/o is only acquire-release).

Of course, just because the standard requires these semantics doesn't mean implementations comply, though in fact FreeBSD with ZFS behaves perfectly, very recent Windows (10.0.14393) with NTFS behaves perfectly, and recent Linuxes with ext4 behaves correctly if O_DIRECT is on. If you would like more detail on how well major OS and filing systems comply with the standard, see this answer

Hanforrd answered 7/2, 2016 at 22:45 Comment(2)
I've never heard of NFS v5?Suffragan
@Suffragan It's been in development for years. They have a mailing list somewhere where I read the above.Hanforrd
C
6

It's not luck, in the sense that if you dig into the kernel you can probably prove that in your particular circumstances it will never happen that one processes' write is interleaved with another one. I am assuming that:

  • You are not hitting any file size limits
  • You are not filling the filesystem in which you create the test file
  • The file is a regular file (not a socket, pipe, or something else)
  • The filesystem is local
  • The buffer does not span multiple virtual memory mappings (this one is known to be true, because it's malloc()ed, which puts it on the heap, which it contiguous.
  • The processes aren't interrupted, signaled, or traced while write() is busy.
  • There are no disk I/O errors, RAM failures, or any other abnormal conditions.
  • (Maybe others)

You will probably indeed find that if all those assumptions hold true, it is the case that the kernel of the operating system you happen to be using always accomplishes a single write() system call with a single atomic contiguous write to the following file.

That doesn't mean you can count on this always being true. You never know when it might not be true when:

  • the program is run on a different operating system
  • the file moves to an NFS filesystem
  • the process gets a signal while the write() is in progress and the write() returns a partial result (fewer bytes than requested). Not sure if POSIX really allows this to happen but I program defensively!
  • etc...

So your experiment can't prove that you can count on non-interleaved writes.

Clericals answered 17/10, 2012 at 21:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.