In general, what can we take for granted when we append to a file in UNIX from multiple processes? Is it possible to lose data (one process overwriting the other's changes)? Is it possible for data to get mangled? (For example, each process is appending one line per append to a log file, is it possible that two lines get mangled?) If the append is not atomic in the above sense, then what's the best way of ensuring mutual exclusion?
A write that's under the size of 'PIPE_BUF' is supposed to be atomic. That should be at least 512 bytes, though it could easily be larger (linux seems to have it set to 4096).
This assume that you're talking all fully POSIX-compliant components. For instance, this isn't true on NFS.
But assuming you write to a log file you opened in 'O_APPEND' mode and keep your lines (including newline) under 'PIPE_BUF' bytes long, you should be able to have multiple writers to a log file without any corruption issues. Any interrupts will arrive before or after the write, not in the middle. If you want file integrity to survive a reboot you'll also need to call fsync(2)
after every write, but that's terrible for performance.
Clarification: read the comments and Oz Solomon's answer. I'm not sure that O_APPEND
is supposed to have that PIPE_BUF
size atomicity. It's entirely possible that it's just how Linux implemented write()
, or it may be due to the underlying filesystem's block sizes.
fsync(2)
gives as much of a guarantee as sync(2)
does, and does not have as as much of a big-hammer impact on performance. –
Justify PIPE_BUF
on that page only applies to pipes and FIFOs, not regular files. –
Cathee PIPE_BUF
values on common Unix systems: ar.to/notes/posix#pipe-buf –
Sticky write(2)
return partially complete with errno=EINTR
? Or do you mean data from other O_APPEND writers interrupting us? (Obviously a write to a specific position without O_APPEND could step on your data if the non-append write goes 2nd.) CPU hardware interrupts can definitely happen, but processes running on other CPUs can make a concurrent write system call! It's up to the kernel's logic to ensure atomicity of writes any time it needs to be guaranteed. –
Logo O_APPEND
flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation." It's possible that a write()
operation may not be complete, but per POSIX it will be atomic. –
Arabella write()
..." Regarding bugzilla.kernel.org/show_bug.cgi?id=55651? So Linux doesn't meet POSIX standards? That's not exactly an isolated case. –
Arabella Edit: Updated August 2017 with latest Windows results.
I'm going to give you an answer with links to test code and results as the author of proposed Boost.AFIO which implements an asynchronous filesystem and file i/o C++ library.
Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.
However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system:
No O_DIRECT/FILE_FLAG_NO_BUFFERING:
Microsoft Windows 10 with NTFS: update atomicity = 1 byte until and including 10.0.10240, from 10.0.14393 at least 1Mb, probably infinite (*).
Linux 4.2.6 with ext4: update atomicity = 1 byte
FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)
O_DIRECT/FILE_FLAG_NO_BUFFERING:
Microsoft Windows 10 with NTFS: update atomicity = until and including 10.0.10240 up to 4096 bytes only if page aligned, otherwise 512 bytes if FILE_FLAG_WRITE_THROUGH off, else 64 bytes. Note that this atomicity is probably a feature of PCIe DMA rather than designed in. Since 10.0.14393, at least 1Mb, probably infinite (*).
Linux 4.2.6 with ext4: update atomicity = at least 1Mb, probably infinite (*). Note that earlier Linuxes with ext4 definitely did not exceed 4096 bytes, XFS certainly used to have custom locking but it looks like recent Linux has finally fixed this.
FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)
You can see the raw empirical test results at https://github.com/ned14/afio/tree/master/programs/fs-probe. Note we test for torn offsets only on 512 byte multiples, so I cannot say if a partial update of a 512 byte sector would tear during the read-modify-write cycle.
So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes on Linux with ext4 unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple.
(*) "Probably infinite" stems from these clauses in the POSIX spec:
All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links ... [many functions] ... read() ... write() ... If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. [Source]
and
Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. [Source]
but conversely:
This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control. [Source]
I wrote a script to empirically test the maximum atomic append size. The script, written in bash, spawns multiple worker processes which all write worker-specific signatures to the same file. It then reads the file, looking for overlapping or corrupted signatures. You can see the source for the script at this blog post.
The actual maximum atomic append size varies not only by OS, but by filesystem.
On Linux+ext3 the size is 4096, and on Windows+NTFS the size is 1024. See the comments below for more sizes.
echo $line >> $OUTPUT_FILE
will result in a single call to write
regardless of the size of $line
. –
Outrageous Here is what the standard says: http://www.opengroup.org/onlinepubs/009695399/functions/pwrite.html.
If the
O_APPEND
flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
write()
..." So there is a guarantee that the write()
is atomic. And note there is no distinction made between different threads and different processes. –
Arabella © 2022 - 2024 — McMap. All rights reserved.
write()
operations to files are atomic: "All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2017 when they operate on regular files or symbolic links: ...pwrite()
...write()
...writev()
" NB that if the write is interrupted by a signal, you can still get a short write. Linux is arguably broken here – Arabella