Is file append atomic in UNIX?
Asked Answered
W

4

135

In general, what can we take for granted when we append to a file in UNIX from multiple processes? Is it possible to lose data (one process overwriting the other's changes)? Is it possible for data to get mangled? (For example, each process is appending one line per append to a log file, is it possible that two lines get mangled?) If the append is not atomic in the above sense, then what's the best way of ensuring mutual exclusion?

Wilford answered 20/7, 2009 at 16:7 Comment(1)
TLDR: Yes. POSIX 7 guarantees not just append, but all write() operations to files are atomic: "All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2017 when they operate on regular files or symbolic links: ... pwrite() ... write() ... writev()" NB that if the write is interrupted by a signal, you can still get a short write. Linux is arguably broken hereArabella
W
72

A write that's under the size of 'PIPE_BUF' is supposed to be atomic. That should be at least 512 bytes, though it could easily be larger (linux seems to have it set to 4096).

This assume that you're talking all fully POSIX-compliant components. For instance, this isn't true on NFS.

But assuming you write to a log file you opened in 'O_APPEND' mode and keep your lines (including newline) under 'PIPE_BUF' bytes long, you should be able to have multiple writers to a log file without any corruption issues. Any interrupts will arrive before or after the write, not in the middle. If you want file integrity to survive a reboot you'll also need to call fsync(2) after every write, but that's terrible for performance.

Clarification: read the comments and Oz Solomon's answer. I'm not sure that O_APPEND is supposed to have that PIPE_BUF size atomicity. It's entirely possible that it's just how Linux implemented write(), or it may be due to the underlying filesystem's block sizes.

Wrestle answered 20/7, 2009 at 16:39 Comment(13)
On sane filesystems, fsync(2) gives as much of a guarantee as sync(2) does, and does not have as as much of a big-hammer impact on performance.Justify
Are you sure about that? Could you provide some link about that behaviour? I found it confirmed if the descriptor is a pipe, but I couldn't find evidence that it works for any file . including normal, non-NFS file objects.Stich
Where exactly in .../write.html? For O_APPEND, I see no mention of PIPE_BUF, and I see promise that "no intervening file modification operation shall occur between changing the file offset and the write operation", but I'm not that sure if this means that the write operation itself is uninterrupted...Villar
To answer that, is it over-simplistic to think that a hard disk only has one write head, so a write can't be interrupted by another write?Thuja
As this answer points out, the statement about PIPE_BUF on that page only applies to pipes and FIFOs, not regular files.Cathee
Here's a table of observed PIPE_BUF values on common Unix systems: ar.to/notes/posix#pipe-bufSticky
With signals arriving this can get even worse: bugzilla.kernel.org/show_bug.cgi?id=55651. Why is this even marked as an answer? PIPE_BUF has nothing to do with files.Offense
@Offense following up on your comment, there's no guarantee in POSIX that append is atomic. This answer, sadly, is completely wrong. The fact that experiments in multi-threaded setting don't show corruption tell us nothing about the guarantees (anything can change between kernel versions). (Not to mention, the experiments themselves are too limited; e.g., they don't deal with signals.)Dominicadominical
Interrupt? Do you mean a POSIX signal making write(2) return partially complete with errno=EINTR? Or do you mean data from other O_APPEND writers interrupting us? (Obviously a write to a specific position without O_APPEND could step on your data if the non-append write goes 2nd.) CPU hardware interrupts can definitely happen, but processes running on other CPUs can make a concurrent write system call! It's up to the kernel's logic to ensure atomicity of writes any time it needs to be guaranteed.Logo
@Dominicadominical Huh? A bit late here but I have no idea where this idea that POSIX does not guarantee atomic append operations comes from. Per POSIX itself: "If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation." It's possible that a write() operation may not be complete, but per POSIX it will be atomic.Arabella
@AndrewHenle I don't see the sentence saying anything about the write operation itself, only that nothing happens between moving the file offset to the end and the write. Where's the guarantee that no other write can happen before the first write finishes? Also, see the very beginning of bugzilla.kernel.org/show_bug.cgi?id=55651.Dominicadominical
@Dominicadominical 2.9.7 Thread Interactions with Regular File Operations "All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2017 when they operate on regular files or symbolic links: ... write() ..." Regarding bugzilla.kernel.org/show_bug.cgi?id=55651? So Linux doesn't meet POSIX standards? That's not exactly an isolated case.Arabella
@AndrewHenle It appears I was wrong. Thanks for the clarification. I was relying (without verification) on the many posts on SO and elsewhere, such as https://mcmap.net/q/168787/-are-posix-39-read-and-write-system-calls-atomic.Dominicadominical
A
40

Edit: Updated August 2017 with latest Windows results.

I'm going to give you an answer with links to test code and results as the author of proposed Boost.AFIO which implements an asynchronous filesystem and file i/o C++ library.

Firstly, O_APPEND or the equivalent FILE_APPEND_DATA on Windows means that increments of the maximum file extent (file "length") are atomic under concurrent writers. This is guaranteed by POSIX, and Linux, FreeBSD, OS X and Windows all implement it correctly. Samba also implements it correctly, NFS before v5 does not as it lacks the wire format capability to append atomically. So if you open your file with append-only, concurrent writes will not tear with respect to one another on any major OS unless NFS is involved.

However concurrent reads to atomic appends may see torn writes depending on OS, filing system, and what flags you opened the file with - the increment of the maximum file extent is atomic, but the visibility of the writes with respect to reads may or may not be atomic. Here is a quick summary by flags, OS and filing system:


No O_DIRECT/FILE_FLAG_NO_BUFFERING:

Microsoft Windows 10 with NTFS: update atomicity = 1 byte until and including 10.0.10240, from 10.0.14393 at least 1Mb, probably infinite (*).

Linux 4.2.6 with ext4: update atomicity = 1 byte

FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)

O_DIRECT/FILE_FLAG_NO_BUFFERING:

Microsoft Windows 10 with NTFS: update atomicity = until and including 10.0.10240 up to 4096 bytes only if page aligned, otherwise 512 bytes if FILE_FLAG_WRITE_THROUGH off, else 64 bytes. Note that this atomicity is probably a feature of PCIe DMA rather than designed in. Since 10.0.14393, at least 1Mb, probably infinite (*).

Linux 4.2.6 with ext4: update atomicity = at least 1Mb, probably infinite (*). Note that earlier Linuxes with ext4 definitely did not exceed 4096 bytes, XFS certainly used to have custom locking but it looks like recent Linux has finally fixed this.

FreeBSD 10.2 with ZFS: update atomicity = at least 1Mb, probably infinite (*)


You can see the raw empirical test results at https://github.com/ned14/afio/tree/master/programs/fs-probe. Note we test for torn offsets only on 512 byte multiples, so I cannot say if a partial update of a 512 byte sector would tear during the read-modify-write cycle.

So, to answer the OP's question, O_APPEND writes will not interfere with one another, but reads concurrent to O_APPEND writes will probably see torn writes on Linux with ext4 unless O_DIRECT is on, whereupon your O_APPEND writes would need to be a sector size multiple.


(*) "Probably infinite" stems from these clauses in the POSIX spec:

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links ... [many functions] ... read() ... write() ... If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. [Source]

and

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. [Source]

but conversely:

This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control. [Source]

You can read more about the meaning of these in this answer

Amygdaline answered 7/2, 2016 at 17:19 Comment(1)
Note that POSIX.1-2017 now states: "This volume of POSIX.1-2017 does not specify the behavior of concurrent writes to a regular file from multiple threads, except that each write is atomic (see Thread Interactions with Regular File Operations) ..."Arabella
A
32

I wrote a script to empirically test the maximum atomic append size. The script, written in bash, spawns multiple worker processes which all write worker-specific signatures to the same file. It then reads the file, looking for overlapping or corrupted signatures. You can see the source for the script at this blog post.

The actual maximum atomic append size varies not only by OS, but by filesystem.

On Linux+ext3 the size is 4096, and on Windows+NTFS the size is 1024. See the comments below for more sizes.

Aryan answered 17/6, 2014 at 18:22 Comment(4)
What filesystem did you test with on Linux? I'm wondering if maybe it's based on filesystem block sizes.Wrestle
@Wrestle I believe at the time I tested it on ext3. If you run it on another FS and get a different result, please post a comment.Aryan
@OzSolomon , I used your script on Debian 7.8, and I was only able to get atomic writes up to and including 1008 bytes (1024 - 16 bytes of overhead?) on both my ext4 partition and a tmpfs mount. Anything beyond that resulted in corruption every time.Tommie
Your test seems to assume that echo $line >> $OUTPUT_FILE will result in a single call to write regardless of the size of $line.Outrageous
R
16

Here is what the standard says: http://www.opengroup.org/onlinepubs/009695399/functions/pwrite.html.

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

Rodent answered 20/7, 2009 at 17:6 Comment(5)
"between" - but what about interventions during the write, which for my understanding happens after the "between"? (I.e.: <change_offset_action> ..."the_between_period"... <write_action> ) - shall I understand there are no guarantees about that?Villar
@Villar agreed; there's no guarantee that the write itself is atomic. But I'm confused: based on the guarantee provided in your quote, it seems we can conclude that a multithreaded app appending the same file will not mix parts of different written records. However, from experiments reported by OzSolomon, we see that even that assumption is violated. Why?Dominicadominical
@Dominicadominical sorry, I'm afraid I don't get your question: firstly, OzSolomon's experiment is multi-process, not a multi-threaded (single process) app; secondly, I don't understand how you draw the conclusion that "a multithreaded app [...] will not mix" — that's exactly what I don't see guaranteed by the quote from Bastien, as I mention in my comment. Can you clarify your question?Villar
Hmm I can't reconstruct my own logic at the time I wrote that comment ... Yes, if your interpretaion is correct then of course the different records may be mixed. But now that I'm rereading Bastien's quote, I think it must mean that nobody can interrupt "during the write" - otherwise the entire paragraph in the standard would be useless, providing literally no guarantees at all (not even that the write will happen at the end, since somone else might move offset as the "write" step isbeing executed.Dominicadominical
@Villar There is no "during the write" or "between" the "offset change" and the "write action": "All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2017 when they operate on regular files or symbolic links: ... write() ..." So there is a guarantee that the write() is atomic. And note there is no distinction made between different threads and different processes.Arabella

© 2022 - 2024 — McMap. All rights reserved.