What exactly is file.flush() doing?
Asked Answered
T

4

181

I found this in the Python documentation for File Objects:

flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.

So my question is: what exactly is Python's flush doing? I thought that it forces to write data to the disk, but now I see that it doesn't. Why?

Tysontyumen answered 19/8, 2011 at 20:32 Comment(0)
T
287

There's typically two levels of buffering involved:

  1. Internal buffers
  2. Operating system buffers

The internal buffers are buffers created by the runtime/library/language that you're programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.

However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.

If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.

So, in order to help with that you have the flush and fsync methods, on their respective objects.

The first, flush, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.

Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been "permanently" stored on disk.

To do that, you need to call the os.fsync method which ensures all operating system buffers are synchronized with the storage devices they're for, in other words, that method will copy data from the operating system buffers to the disk.

Typically you don't need to bother with either method, but if you're in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.


Addendum in 2018.

Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don't really know.

Triciatrick answered 19/8, 2011 at 20:40 Comment(10)
When I use the with file('blah') as fd: #dostuff construct, I know it guarantees closing the file descriptor. Does it also flush or sync?Motivity
@Marcin: It flushes, but does NOT sync.Blat
fsync is necessary for atomicity. you can't expect to close a file, reopen it and find your content without a fsync in the middle. It often works, but it doesn't on linux with ext4 and default mount options for example. Also fsync is not guaranteed to really magnet-flip the iron on the platters, because 1: fsync can be disabled (by laptop-mode), and 2: the hard disk internal buffering might not be instructed to flush.Bushweller
is there any way to flush an operating system's buffer for all files, if the file is written by another process?Zeniazenith
Any idea why python doesn't also fsync on file closing? It seems logical to me that when you close a file you want it to be on disk the way you had it when you closed it. Is fsync expensive or is there another reason to only flush?Kilocycle
fsync is relatively expensive. In general, you're not writing mission critical software that needs 100% ACID compliance and durability for disk-access, and if you do you're probably painfully aware of it and should be aware of the steps you can take to get these guarantees. Calling fsync will wait for physical disk access to occur to write the data to disk, whereas flushing and closing will only wait for data to be moved to cache memory. The speed difference is probably several orders of magnitude.Triciatrick
@Zeniazenith os.sync seems to be only available on Unix.Doughboy
And Windows, I have no idea about other platforms.Triciatrick
I can't find anything about file.flush() in the modern Python 3 documentation anymore, so I think this advice to call file.flush() followed by os.fsync() is no longer applicable, and is outdated. I added those two calls into a datalogging script I had, every 10 f.write() iterations, and it went from logging all of my data, even if I Ctrl + C killed it while running, to logging only every 10th f.write(), even though I was calling f.write() every iteration and f.flush + os.fsync() every 10th iteration. So, I'm not using those anymore. f.close() is still wise though when done.Ballista
@GabrielStaples docs.python.org/3.12/library/io.html#io.IOBase.flush. Also please see docs.python.org/3.12/library/os.html#os.fsync - it explicitly states that you need f.flush and os.fsync to ensure all buffers flushed to disk.Fairtrade
I
13

Because the operating system may not do so. The flush operation forces the file data into the file cache in RAM, and from there it's the OS's job to actually send it to the disk.

Istle answered 19/8, 2011 at 20:34 Comment(1)
You're right, but actually is relative here: if the target device has write caching enabled, data might not have reached the actual platters/chips when os.fsync() returns.Cahoon
Q
9

It flushes the internal buffer, which is supposed to cause the OS to write out the buffer to the file.[1] Python uses the OS's default buffering unless you configure it do otherwise.

But sometimes the OS still chooses not to cooperate. Especially with wonderful things like write-delays in Windows/NTFS. Basically the internal buffer is flushed, but the OS buffer is still holding on to it. So you have to tell the OS to write it to disk with os.fsync() in those cases.

[1] http://docs.python.org/library/stdtypes.html

Quilting answered 19/8, 2011 at 20:44 Comment(0)
D
0

Basically, flush() cleans out your RAM buffer, its real power is that it lets you continue to write to it afterwards - but it shouldn't be thought of as the best/safest write to file feature. It's flushing your RAM for more data to come, that is all. If you want to ensure data gets written to file safely then use close() instead.

Dancer answered 12/12, 2018 at 19:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.