What does SetFileValidData doing ? what is the difference with SetEndOfFile?
Asked Answered
D

4

11

I look for a way to extend a file asynchronously and efficiently .

In a support document Asynchronous Disk I/O Appears as Synchronous on Windows NT, Windows 2000, and Windows XP said:

NOTE: Applications can make the previously mentioned write operation asynchronous by changing the Valid Data Length of the file by using the SetFileValidData function, and then issuing a WriteFile.

in MSDN, SetFileValidData is a function for Sets the valid data length of the specified file.

But I still not understand what is the "valid data", what is the difference between it and the size of file?

I can use SetFilePointerEx and SetEndOfFile to extend the file size, but how do this by SetFileValidData?

SetFileValidData cannot input a argument large than the size of file. In this case, what is the living meaning of SetFileValidData?

Delphinedelphinia answered 1/9, 2012 at 13:11 Comment(1)
The documentation for SetEndOfFile explains the difference.Apartment
A
26

When you use SetEndOfFile to increase the length of a file, the logical file length changes and the necessary disk space is allocated, but no data is actually physically written to the disk sectors corresponding to the new part of the file. The valid data length remains the same as it was.

This means you can use SetEndOfFile to make a file very large very quickly, and if you read from the new part of the file you'll just get zeros. The valid data length increases when you write actual data to the new part of the file.

That's fine if you just want to reserve space, and will then be writing data to the file sequentially. But if you make the file very large and immediately write data near the end of it, zeros need to be written to the new part of the file, which will take a long time. If you don't actually need the file to contain zeros, you can use SetFileValidData to skip this step; the new part of the file will then contain random data from previously deleted files.

Addendum:

  • The rules for sparse files are different.

  • You should not use SetFileValidData on a file that non-privileged users have read access to; this could leak content from deleted files that belonged to other users.

Afro answered 3/9, 2012 at 3:8 Comment(1)
This answer is very concise and should be selected as the correct one.Densitometer
A
7

Please note that SetEndOfFile() doesn't write any zeros to any allocated sectors on disk, it just allocates the space pointers inside MFT records and then updates the space bitmap of the whole file system. But the OS, or FS, will record the valid/logical file length in its MFT record.

If you enlarge the file, from 1GB to 2GB, then the appended 1GB should be all zeros, but the FS won't write the zeros to disks, it refers to this file's valid length to know that the 1GB should be zeros. If you try to read from this enlarged 1GB portion, it will fill zeros directly in RAM and then feedback to your application. But if you write any byte inside this 1GB portion, the FS has to fill with zeros from the original 1GB offset to the current pointer that your application is trying to write to, but not the other bytes from the current location to the tail of the file. Meanwhile, it records the valid/logical length to be from 0 to the current location, the physical size and allocated size is still 2GB.

But, if you use SetFileValidData(), the FS will set the valid length to 2GB directly, and won't bother to fill any zeros. Whatever location you write to, it just writes, but whatever location you read from, you may read out some garbage data which was previously generated by other applications before the file was extended into that disk space.

Actinomorphic answered 5/11, 2013 at 0:38 Comment(0)
P
4

Agree with Harry Johnston's answer, and from the practice point of view, while SetFileValidData has performance advantage because it does not require writing zeros, it does have security implications because the file might contain data from other deleted files. So a special privilege, SE_MANAGE_VOLUME_NAME, is required, as MSDN mentioned: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365544(v=vs.85).aspx

The reason is that, if the user account of the running program doesn't have that privilege, using SetFileValidData can expose other user's deleted data into the view of that particular file, so normal users (non-administrators) are not allowed to do that. Even for privileged users, they still need to take care to use ACL (access control lists) in the file system to protect that file so that it is not shared with non-privileged users.

Phalange answered 21/11, 2014 at 9:40 Comment(0)
E
-1

It seems that SenEndofFile does not really allocate reserved disk space for the target file, SetFileValidData is responsible for the job.

Refered to MSDN,

You can use the SetFileValidData function to create large files in very specific circumstances so that the performance of subsequent file I/O can be better than other methods. Specifically, if the extended portion of the file is large and will be written to randomly, such as in a database type of application, the time it takes to extend and write to the file will be faster than using SetEndOfFile and writing randomly.

If SetEndOfFile really allocate space, then SetFileValidData will do nothing better than SetEndOfFile when writing randomly. So SetEndOfFile may just create a sparse file with hole(s), while SetFileValidData do the actual allocation.

Eddy answered 27/1, 2013 at 3:24 Comment(6)
SetEndOfFile does indeed allocate the disk space - that is, it assigns as many clusters as necessary for the exclusive use of the file in question - but it does not write zeros to those clusters. The zeros will be written as and if necessary. This is because allocating the clusters is a fast operation (involving only changes to the master file table) but writing the zeros takes a long time and would be wasted if the file is then written sequentially. SetFileValidData avoids the need to write zeros, but requires admin privilege.Afro
Thanks for your clarification. But if SetEndOfFile does allocate file table, how can SetFileValidData get better performance than other methods, or just better than "SetEndOfFile plus writing zeros"? Then what the necessity for SetFileValidData to be a somewhat redundant interface? I mean, writing zeros after SetEndOfFile seems do no help. Are there any exterior evidence or official documentation for this problem? By the way, are there any corresponding function(s) in POSIX interfaces?Eddy
If you use SetEndOfFile to extend a file and then seek to the last sector and write some data, the operating system will have to write zeros to the rest of the file (everywhere between the previous valid data mark and the new valid data mark) because otherwise you could read the original contents of the allocated sectors, which might contain data from deleted files that belonged to another user. SetFileValidData tells the operating system not to bother erasing the old data, but you need administrator privilege to use it.Afro
You've already linked to the MSDN article that explains all this; I'm not sure what you mean by "this problem", as I don't see that this behaviour is a problem. I have no idea about POSIX.Afro
I'm sorry, "this problem" should be "this issue". Referring to flexhex, when use SetEndOfFile to extend a file, the operating system may just appends the file with sparse zeros, especially if length of the appendix is large, but not really allocated space or automatically written zeros. The file system just keeps any sparse holes in mind and logically returns zeros when read requests cover these sparse holes.Eddy
Windows supports sparse files, but it doesn't happen automatically, the programmer has to specifically request it. For non-sparse files, SetEndOfFile allocates space but does not zero the content. For sparse files, SetEndOfFile does not allocate space.Afro

© 2022 - 2024 — McMap. All rights reserved.