Is rename() without fsync() safe?
Asked Answered
K

3

36

Is it safe to call rename(tmppath, path) without calling fsync(tmppath_fd) first?

I want the path to always point to a complete file. I care mainly about Ext4. Is the rename() promised to be safe in all future Linux kernel versions?

A usage example in Python:

def store_atomically(path, data):
    tmppath = path + ".tmp"
    output = open(tmppath, "wb")
    output.write(data)

    output.flush()
    os.fsync(output.fileno())  # The needed fsync().
    output.close()
    os.rename(tmppath, path)
Kami answered 15/9, 2011 at 15:3 Comment(1)
especially on ext4. Yes I know that Tso has provided a 'backwards compatibility' fix covering most cases but that cannot be relied on for portability or future versionsMur
M
37

No.

Look at libeatmydata, and this presentation:

Eat My Data: How Everybody Gets File IO Wrong

http://www.oscon.com/oscon2008/public/schedule/detail/3172

by Stewart Smith from MySql.

In case it is offline/no longer available, I keep a copy of it:

Mur answered 15/9, 2011 at 15:9 Comment(8)
In summary: fsync() is needed when not using data=ordered. Consider using sqlite for your writes.Kami
@Ivo: thanks; though the most important thing I try to remember from this subject is without a doubt that reliable disk transactions are hard and best left to library writers. That's where your sqlite advice seems sound. In programming, the accidental complexity can apparently simple tasks absolutely non-trivial; In my experience this happens 'where the rubber meets the road' (scalability, networking, hardware access, interaction design)Mur
In summary, without the fsync() the metadata may be written before the data, which if there was a crash in between, would cause the new renamed file to have partial/empty dataSpanishamerican
Summary of video: Use SQLite (or another database) if you care about data consistency (you might not care if you're e.g. writing DVD-ripping software where you can just try again if there's a crash). Otherwise, you will get it wrong (e.g. even using fsync doesn't guarantee a sync on certain operating systems).Polypody
Or some combinations of (virtual) block devices and filesystems on those OSesMur
Unfortunately your copy is not accessible anymoreMchugh
@Mchugh You're fast! I just had a security alert less than 12 hours ago, and took the machine offline for investigation. I'll probably be back online in a week or 2, maxMur
I found the video at youtube.com/watch?v=LMe7hf2G1poNobles
S
4

From ext4 documentation:

When mounting an ext4 filesystem, the following option are accepted:
(*) == default

auto_da_alloc(*)    Many broken applications don't use fsync() when 
noauto_da_alloc     replacing existing files via patterns such as
                    fd = open("foo.new")/write(fd,..)/close(fd)/
                    rename("foo.new", "foo"), or worse yet,
                    fd = open("foo", O_TRUNC)/write(fd,..)/close(fd).
                    If auto_da_alloc is enabled, ext4 will detect
                    the replace-via-rename and replace-via-truncate
                    patterns and force that any delayed allocation
                    blocks are allocated such that at the next
                    journal commit, in the default data=ordered
                    mode, the data blocks of the new file are forced
                    to disk before the rename() operation is
                    committed.  This provides roughly the same level
                    of guarantees as ext3, and avoids the
                    "zero-length" problem that can happen when a
                    system crashes before the delayed allocation
                    blocks are forced to disk.

Judging by the wording "broken applications", it is definitely considered bad practice by the ext4 developers, but in practice it is so widely used approach that it was patched in ext4 itself.

So if your usage fits the pattern, you should be safe.

If not, I suggest you to investigate further instead of inserting fsync here and there just to be safe. That might not be such a good idea since fsync can be a major performance hit on ext3 (read).

On the other hand, flushing before rename is the correct way to do the replacement on non-journaling file systems. Maybe that's why ext4 at first expected this behavior from programs, the auto_da_alloc option was added later as a fix. Also this ext3 patch for the writeback (non-journaling) mode tries to help the careless programs by flushing asynchronously on rename to lower the chance of data loss.

You can read more about the ext4 problem here.

Shroyer answered 28/12, 2016 at 13:6 Comment(0)
A
1

If you only care about ext4 and not ext3 then I'd recommend using fsync on the new file before doing the rename. The fsync performance on ext4 seems to be much better than on ext3 without the very long delays. Or it might be the fact that writeback is the default mode (at least on my Linux system).

If you only care that the file is complete and not which file is named in the directory then you only need to fsync the new file. There's no need to fsync the directory too since it will point to either the new file with its complete data, or the old file.

Alienism answered 15/9, 2011 at 16:2 Comment(1)
I had data=ordered by default. It is seen in /proc/mounts. The option data=writeback helped. My benchmark of 1000 renames went from 2.2s to 0.8s. The fsync() still takes time. Without it, the benchmark ran 0.9s (ordered) and 0.2s (writeback).Kami

© 2022 - 2024 — McMap. All rights reserved.