fsync vs write system call
Asked Answered
N

1

7

I would like to ask a fundamental question about when is it useful to use a system call like fsync. I am beginner and i was always under the impression that write is enough to write to a file, and samples that use write actually write to the file at the end.

So what is the purpose of a system call like fsync?

Just to provide some background i am using Berkeley DB library version 5.1.19 and there is a lot of talk around the cost of fsync() vs just writing. That is the reason i am wondering.

Neogene answered 29/4, 2012 at 9:17 Comment(0)
S
16

Think of it as a layer of buffering.

If you're familiar with the standard C calls like fopen and fprintf, you should already be aware of buffering happening within the C runtime library itself.

The way to flush those buffers is with fflush which ensures that the information is handed from the C runtime library to the OS (or surrounding environment).

However, just because the OS has it, doesn't mean it's on the disk. It could get buffered within the OS as well.

That's what fsync takes care of, ensuring that the stuff in the OS buffers is written physically to the disk.

You may typically see this sort of operation in logging libraries:

fprintf (myFileHandle, "something\n");  // output it
fflush (myFileHandle);                  // flush to OS
fsync (fileno (myFileHandle));          // flush to disk

fileno is a function which gives you the underlying int file descriptor for a given FILE* file handle, and fsync on the descriptor does the final level of flushing.

Now that is a relatively expensive operation since the disk write is usually considerably slower than in-memory transfers.

As well as logging libraries, one other use case may be useful for this behaviour. Let me see if I can remember what it was. Yes, that's it. Databases! Just like Berzerkely DB. Where you want to ensure the data is on the disk, a rather useful feature for meeting ACID requirements :-)

Stylize answered 29/4, 2012 at 9:39 Comment(4)
Thanks, that's helpful. If i don't call fsync explicitly, when does the OS(Linux in my case) decides to flush the data to disk?Neogene
@isaac.hazan, no idea off the top of my head. Possibilities are when the buffers fill up, when the file descriptor closes, when a certain amount of time has passed, or whenever it feels like it :-) And it may well change depending on the device drivers for whatever filesystem you're using. Basically, you should just not worry about when unless you need it done now, in which case you would fsync. Otherwise, leave it up to the OS.Stylize
@paxidablo: 1) "When the file descriptor closes": This is one of the most wide spread misunderstandings. close does not imply that the buffered pages are written. 2) The alternative to fsync is to open a file with the O_SYNC flag. 3) In bad consumer disks/SSDs, even a fsync might not be enough to force data to be persistent. In this case there is basically nothing a user/developer can do.Senate
@dmeister, those were possibilities rather than definitive statements. I would have to look into the kernel source to confirm, and that's more effort than it's worth given that the default behaviour is usually enough (barring serious system crash) and if you decide fsync is needed and it fails, there's not much else you can do :-)Stylize

© 2022 - 2024 — McMap. All rights reserved.