Win32: Write to file without buffering?
Asked Answered
P

5

14

I need to create a new file handle so that any write operations to that handle get written to disk immediately.

Extra info: The handle will be the inherited STDOUT of a child process, so I need any output from that process to immediately be written to disk.

Studying the CreateFile documentation, the FILE_FLAG_WRITE_THROUGH flag looked like exactly what I need:

Write operations will not go through any intermediate cache, they will go directly to disk.

I wrote a very basic test program and, well, it's not working. I used the flag on CreateFile then used WriteFile(myHandle,...) in a long loop, writing about 100MB of data in about 15 seconds. (I added some Sleep()'s).

I then set up a professional monitoring environment consisting of continuously hitting 'F5' in explorer. The results: the file stays at 0kB then jumps to 100MB about the time the test program ends.

Next thing I tried was to manually flush the file after each write, with FlushFileBuffers(myHandle). This makes the observed file size grow nice and steady, as expected.

My question is, then, shouldn't the FILE_FLAG_WRITE_THROUGH have done this without manually flushing the file? Am I missing something? In the 'real world' program, I can't flush the file, 'cause I don't have any control over the child process that's using it.

There's also the FILE_FLAG_NO_BUFFERING flag, that I can't be used for the same reason - no control over the process that's using the handle, so I can't manually align the writes as required by this flag.

EDIT: I have made a separate project specifically for watching how the size of the file changes. It uses the .NET FileSystemWatcher class. I also write less data - around 100kB in total.

Here's the output. Check out the seconds in the timestamps.

The 'builtin no-buffers' version:

25.11.2008 7:03:22 PM: 10230 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10200 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10190 bytes added.

... and the 'forced (manual) flush' version (FlushFileBuffers() is called every ~2.5 seconds):

25.11.2008 7:06:10 PM: 10230 bytes added.
25.11.2008 7:06:12 PM: 10230 bytes added.
25.11.2008 7:06:15 PM: 10230 bytes added.
25.11.2008 7:06:17 PM: 10230 bytes added.
25.11.2008 7:06:19 PM: 10230 bytes added.
25.11.2008 7:06:21 PM: 10230 bytes added.
25.11.2008 7:06:23 PM: 10230 bytes added.
25.11.2008 7:06:25 PM: 10230 bytes added.
25.11.2008 7:06:27 PM: 10230 bytes added.
25.11.2008 7:06:29 PM: 10230 bytes added.
Prosperous answered 25/11, 2008 at 15:54 Comment(3)
why do you think you need this?Visitation
+1 for the professional Explorer + F5. But I have to remind you that visibility within same op system does not mean the flushing/durability. You must act more professionally: either reset the PC or hot-extract the HDD from the system. I am not sure that reset or power off won't initiate the onboard cache flushing. Only if you remove the drive physically, or use some fake storage, you can be sure that flushing has reached the device indeed.Maryleemarylin
@RecognizeEvilasWaste hmm... let's see if it worProsperous
C
13

I've been bitten by this, too, in the context of crash logging.

FILE_FLAG_WRITE_THROUGH only guarantees that the data you're sending gets sent to the filesystem before WriteFile returns; it doesn't guarantee that it's actually sent to the physical device. So, for example, if you execute a ReadFile after a WriteFile on a handle with this flag, you're guaranteed that the read will return the bytes you wrote, whether it got the data from the filesystem cache or from the underlying device.

If you want to guarantee that the data has been written to the device, then you need FILE_FLAG_NO_BUFFERING, with all the attendant extra work. Those writes have to be aligned, for example, because the buffer is going all the way down to the device driver before returning.

The Knowledge Base has a terse but informative article on the difference.

In your case, if the parent process is going to outlive the child, then you can:

  1. Use the CreatePipe API to create an inheritable, anonymous pipe.
  2. Use CreateFile to create a file with FILE_FLAG_NO_BUFFERING set.
  3. Provide the writable handle of the pipe to the child as its STDOUT.
  4. In the parent process, read from the readable handle of the pipe into aligned buffers, and write them to the file.
Cosmopolite answered 25/11, 2008 at 17:27 Comment(3)
Nice idea on how to sidestep the FILE_FLAG_NO_BUFFERING limitations! I'll just have to see how buffering works with CreatePipe(). Thanks!Prosperous
Are you certain that is true? The KB article says: "The data is cached (stored in the disk cache); however, it is still written directly to the file.This method allows a read operation on that data to satisfy the read request from cached data (if it's still there), rather than having to do a file read to get the data. The write call doesn't return until the data is written to the file." To me that seems to mean that it flushes data to disk, but still keeps a copy in the cache to serve future reads.Viol
Note the date. What I wrote was true in 2008, and I suspect the KB article has been updated since then. :-) My understanding is that the rules for device drivers changed in the meantime (specifically, Windows since ~Windows 8 inserts a flush command after writes when you provide FILE_FLAG_WRITE_THROUGH if you're using a device that doesn't honor the forced unit access bit). So FILE_FLAG_WRITE_THROUGH can provide more strict guarantees now. I don't work on Windows platforms anymore, though, so I can't verify that this is the case.Cosmopolite
S
6

This is an old question but I thought I might add a bit to it. Actually everyone here I believe is wrong. When you write to a stream with write-through and unbuffered-io it does write to the disk but it does NOT update the metadata associated with the File System (eg what explorer shows you).

You can find a good reference on this kind of stuff here http://winntfs.com/2012/11/29/windows-write-caching-part-2-an-overview-for-application-developers/

Cheers,

Greg

Summit answered 24/7, 2013 at 15:38 Comment(1)
MSDN explicitly says the exact opposite, though. It says metadata is being flushed. Anyway, what's surprising is what FILE_FLAG_WRITE_THROUGH is actually supposed to be used for. It's basically a normal, buffered write with only one difference, writeback of dirty pages starts immediately (as opposed to "some unspecified time later"). Which is fine, except the KB states that it always runs synchronously and blocks until the complete write has finished. Which makes the whole thing nonsensical (unbuffered can run asynchronously no problem).Wyrick
A
2

Perhaps you could be satisfied enough with FlushFileBuffers:

Flushes the buffers of a specified file and causes all buffered data to be written to a file.

Typically the WriteFile and WriteFileEx functions write data to an internal buffer that the operating system writes to a disk or communication pipe on a regular basis. The FlushFileBuffers function writes all the buffered information for a specified file to the device or pipe.

They do warn that calling flush, to flush the buffers a lot, is inefficient - and it's better to just disable caching (i.e. Tim's answer):

Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.

If it's not a high-performance situation, and you won't be flushing too frequently, then FlushFileBuffers might be sufficient (and easier).

Algar answered 3/3, 2010 at 20:9 Comment(2)
+1 for the reference to flag 'FILE_FLAG_WRITE_THROUGH'. I already mention using FlushFileBuffers in the question - as a workaround for the straight-forward (not working) solution, and that's what I was trying to avoid.Prosperous
Sorry, I meant +1 for the reference to flag 'FILE_FLAG_NO_BUFFERING' (although that's been answered before)Prosperous
F
2

The size you're looking at in Explorer may not be entirely in-sync with what the file system knows about the file, so this isn't the best way to measure it. It just so happens that FlushFileBuffers will cause the file system to update the information that Explorer is looking at; closing it and reopening may end up doing the same thing as well.

Aside from the disk caching issues mentioned by others, write through is doing what you were hoping it is doing. It's just that doing a 'dir' in the directory may not show up-to-date information.

Answers suggesting that write-through only writes it "to the file system" are not quite right. It does write it into the file system cache, but it also sends the data down to the disk. Write-through might mean that a subsequent read is satisfied from the cache, but it doesn't mean that we skipped a step and aren't writing it to the disk. Read the article's summary very carefully. This is a confusing bit for just about everyone.

Freddie answered 27/8, 2010 at 20:28 Comment(0)
M
1

Perhaps you wanna consider memory mapping that file. As soon as you write to the memory mapped region, the file gets updated.

Win API File Mapping

Meneau answered 15/2, 2018 at 20:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.