How to get "valid data length" of a file?
Asked Answered
M

5

7

There is a function to set the "valid data length" value: SetFileValidData, but I didn't find a way to get the "valid data length" value.

I want to know about given file if the EOF is different from the VDL, because writing after the VDL in case of VDL<EOF will cause a performance penalty as described here.

Mcgowen answered 23/2, 2016 at 8:52 Comment(15)
Also, the typical "valid data length" is the same as the actual size of the file.Freesia
@JoachimPileborg I'm not dealing with the typical case...Mcgowen
Curious, won't you just be including whatever bytes that exist on the disk that are after the end of the file data?Acyl
Then please tell us what case you are dealing with? What are you doing? Why are you doing it? What is the use case? What is the code doing? Please read about how to ask good questions and learn how to create a Minimal, Complete, and Verifiable Example.Freesia
@JoachimPileborg the question is simple, I want to get this valueMcgowen
It looks like you are meant to remember the value that you passed when you called SetFileValidData.Cursive
@DavidHeffernan what if I'm not the only one using that file...Mcgowen
If you have two parties writing to the same file, then they'll need to coordinate their actions in any case.Cursive
I'm given the file from the outside. In case I get a 0 VDL and large EOF (see the utility "fsutil" with params file createnew) I want to reject the file, to avoid zeroing the whole file on writes to its end.Mcgowen
I think that this is a fair question that should not have been closed. I do think though that it would help your cause if you explained the motivation for your asking in an edit to the question. Perhaps there's another way to solve the underlying problem you face.Cursive
I don't think this information in tracked in any attribute at a FS level and is thus lost when you close the handle. There is no use in keeping this data around after you close the handle because when you reopen the file (from an OS point of view) everything in there is valid. There is no option to open and file and zero-out-what-was-not-zeroed-before. Also this would open a nice attack vector, just look for all files that have these attribute and see if you can get some useful data out of it.Timetable
I think you're asking the wrong question. If the file system has been told the file contains valid data up to X bytes, it believes you and treats that data just like any other file on disk. It's the other case, where a file has been extended but not been marked as valid that it has to track - because it's in that case that it needs to know to zero out the blocks at the appropriate time.Luedtke
Stop telling me what is the question I want to ask. There is a "property" called valid-data-length, and the question is how can I get it. The answer can be "There is no way", but that doesn't mean that the question is wrongMcgowen
Where is your reference that this "property" actually does exist? As far as I can tell, it doesn't. Instead of getting defensive and arguing with people who are trying to help you, start listening to them.Luedtke
@Jonathan the msdn docs seem to state that this property is tracked, but I agree that asker would have been better served by providing motivation in the questionCursive
M
2

I found this page, claims that:

there is no mechanism to query the value of the VDL

So the answer is "you can't".

If you care about performance you can set the VDL to the EOF, but then note that you may allow access old garbage on your disk - the part between those two pointers, that supposed to be zeros if you would access that file without setting the VDL to point the EOF.

Mcgowen answered 28/2, 2016 at 9:9 Comment(0)
C
3

Looked into this. No way to get this information via any API, even the e.g. NtQueryInformationFile API (FileEndOfFileInformation only worked with NtSetInformationFile). So finally I read this by manually reading NTFS records. If anyone has a better way, please tell! This also obviously only works with full system access (and NTFS) and might be out of sync with the in-memory information Windows uses.

#pragma pack(push)
#pragma pack(1)
struct NTFSFileRecord
{
    char magic[4];
    unsigned short sequence_offset;
    unsigned short sequence_size;
    uint64 lsn;
    unsigned short squence_number;
    unsigned short hardlink_count;
    unsigned short attribute_offset;
    unsigned short flags;
    unsigned int real_size;
    unsigned int allocated_size;
    uint64 base_record;
    unsigned short next_id;
    //char padding[470];
};

struct MFTAttribute
{
    unsigned int type;
    unsigned int length;
    unsigned char nonresident;
    unsigned char name_lenght;
    unsigned short name_offset;
    unsigned short flags;
    unsigned short attribute_id;
    unsigned int attribute_length;
    unsigned short attribute_offset;
    unsigned char indexed_flag;
    unsigned char padding1;
    //char padding2[488];
};

struct MFTAttributeNonResident
{
    unsigned int type;
    unsigned int lenght;
    unsigned char nonresident;
    unsigned char name_length;
    unsigned short name_offset;
    unsigned short flags;
    unsigned short attribute_id;
    uint64 starting_vnc;
    uint64 last_vnc;
    unsigned short run_offset;
    unsigned short compression_size;
    unsigned int padding;
    uint64 allocated_size;
    uint64 real_size;
    uint64 initial_size;
};
#pragma pack(pop)

HANDLE GetVolumeData(const std::wstring& volfn, NTFS_VOLUME_DATA_BUFFER& vol_data)
{
    HANDLE vol = CreateFileW(volfn.c_str(), GENERIC_WRITE | GENERIC_READ, 
        FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (vol == INVALID_HANDLE_VALUE)
        return vol;

    DWORD ret_bytes;
    BOOL b = DeviceIoControl(vol, FSCTL_GET_NTFS_VOLUME_DATA,
        NULL, 0, &vol_data, sizeof(vol_data), &ret_bytes, NULL);

    if (!b)
    {
        CloseHandle(vol);
        return INVALID_HANDLE_VALUE;
    }

    return vol;
}


int64 GetFileValidData(HANDLE file, HANDLE vol, const NTFS_VOLUME_DATA_BUFFER& vol_data)
{
    BY_HANDLE_FILE_INFORMATION hfi;
    BOOL b = GetFileInformationByHandle(file, &hfi);
    if (!b)
        return -1;

    NTFS_FILE_RECORD_INPUT_BUFFER record_in;
    record_in.FileReferenceNumber.HighPart = hfi.nFileIndexHigh;
    record_in.FileReferenceNumber.LowPart = hfi.nFileIndexLow;
    std::vector<BYTE> buf;
    buf.resize(sizeof(NTFS_FILE_RECORD_OUTPUT_BUFFER) + vol_data.BytesPerFileRecordSegment - 1);
    NTFS_FILE_RECORD_OUTPUT_BUFFER* record_out = reinterpret_cast<NTFS_FILE_RECORD_OUTPUT_BUFFER*>(buf.data());
    DWORD bout;
    b = DeviceIoControl(vol, FSCTL_GET_NTFS_FILE_RECORD, &record_in,
        sizeof(record_in), record_out, 4096, &bout, NULL);

    if (!b)
        return -1;

    NTFSFileRecord* record = reinterpret_cast<NTFSFileRecord*>(record_out->FileRecordBuffer);

    unsigned int currpos = record->attribute_offset;
    MFTAttribute* attr = nullptr;
    while ( (attr==nullptr ||
        attr->type != 0xFFFFFFFF  )
        && record_out->FileRecordBuffer + currpos +sizeof(MFTAttribute)<buf.data() + bout)
    {
        attr = reinterpret_cast<MFTAttribute*>(record_out->FileRecordBuffer + currpos);
        if (attr->type == 0x80
            && record_out->FileRecordBuffer + currpos + attr->attribute_offset+sizeof(MFTAttributeNonResident)
                < buf.data()+ bout)
        {
            if (attr->nonresident == 0)
                return -1;

            MFTAttributeNonResident* dataattr = reinterpret_cast<MFTAttributeNonResident*>(record_out->FileRecordBuffer
                + currpos + attr->attribute_offset);
            return dataattr->initial_size;
        }
        currpos += attr->length;
    } 

    return -1;
}

[...]
    NTFS_VOLUME_DATA_BUFFER vol_data;
    HANDLE vol = GetVolumeData(L"\\??\\D:", vol_data);
    if (vol != INVALID_HANDLE_VALUE)
    {
        int64 vdl = GetFileValidData(alloc_test->getOsHandle(), vol, vol_data);
        if(vdl>=0) { [...] }
        [...]
    }
[...]
Corridor answered 6/9, 2020 at 15:56 Comment(1)
Try FSCTL_QUERY_FILE_REGIONSAntichlor
M
2

I found this page, claims that:

there is no mechanism to query the value of the VDL

So the answer is "you can't".

If you care about performance you can set the VDL to the EOF, but then note that you may allow access old garbage on your disk - the part between those two pointers, that supposed to be zeros if you would access that file without setting the VDL to point the EOF.

Mcgowen answered 28/2, 2016 at 9:9 Comment(0)
F
1

I think you are confused as to what "valid data length" actually means. Check this answer.

Basically, while SetEndOfFile lets you increase the length of a file quickly, and allocates the disk space, if you skip to the (new) end-of-file to write there, all the additionally allocated disk space would need to be overwritten with zeroes, which is kind of slow.

SetFileValidData lets you skip that zeroing-out. You're telling the system, "I am OK with whatever is in those disk blocks, get on with it". (This is why you need the SE_MANAGE_VOLUME_NAME priviledge, as it could reveal priviledged data to unpriviledged users if you don't overwrite the data. Users with this priviledge can access the raw drive data anyway.)

In either case, you have set the new effective size of the file. (Which you can read back.) What, exactly, should a seperate "read file valid data" report back? SetFileValidData told the system that whatever is in those disk blocks is "valid"...


Different approach of explanation:

The documentation mentions that the "valid data length" is being tracked; the purpose for this is for the system to know which range (from end-of-valid-data to end-of-file) it still needs to zero out, in the context of SetEndOfFile, when necessary (e.g. you closing the file). You don't need to read back this value, because the only way it could be different from the actual file size is because you, yourself, did change it via the aforementioned functions...

Feoffee answered 23/2, 2016 at 9:2 Comment(2)
I am well aware of the meaning of the VDL. In my scenario, it makes perfect sense to query it, as I want to reject files provided to me with VDL 0 and large EOF. (Remember, VDL isn't always set by me + it may have changed since last set using SetFileValidData, due to writes near file's end.)Mcgowen
@avim: Putting that information into the question itself would have gone a long way towards not getting it closed as "unclear", and getting to-the-point answers. That being said, I don't think there is a way to read that property, short of relying on undocumented behaviour. I am also still not sure about your use case; the available documentation on VDL is not clear on the semantics in a case like you are describing, so you might already be assuming too much.Feoffee
A
0

On the command line it is simply as follows

fsutil file queryvaliddata

For example, a file created with Fsutil file createnew C:\largefile 53687091200 will have a 50gb "EOF", but a zero byte "VDL" - it will have a 50gb "NVDL"

Tracing fsutil.exe with procmon reveals a call to FSCTL_QUERY_FILE_REGIONS in the Win32Api

This is the code you need in C# - the documentation was slim and theres 0 existing help for doing this in .Net, but I got there in the end.

Note that just as with fsutil file queryvaliddata /D, there will only ever be one NVDL area.

public class NTFSValidData
{
    // Define the FSCTL_QUERY_FILE_REGIONS control code
    private const uint FSCTL_QUERY_FILE_REGIONS = 0x00090284;
    private const uint FILE_REGION_USAGE_VALID_NONCACHED_DATA = 0x00000002;
    private const uint FILE_REGION_USAGE_VALID_CACHED_DATA = 0x00000001;

    // Define the FILE_REGION_INPUT structure
    [StructLayout(LayoutKind.Sequential)]
    public struct FILE_REGION_INPUT
    {
        public long FileOffset;
        public long Length;
        public uint DesiredUsage;
        public uint Reserved;
    }

    //Correct struct def
    //https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/6a8b64a4-ea26-4c5a-8728-c992dd657664
    [StructLayout(LayoutKind.Sequential)]
    public struct FILE_REGION_OUTPUT
    {
        public uint Flags;
        public uint TotalRegionEntryCount;
        public uint RegionEntryCount;
        public uint Reserved;
        public FILE_REGION_INFO Region;
    }

    // Define the FILE_REGION_INFO structure
    [StructLayout(LayoutKind.Sequential)]
    public struct FILE_REGION_INFO
    {
        public long FileOffset;
        public long Length;
        public uint DesiredUsage;
        public uint Reserved;
    }
    //Perform file region query
    //Based on fsutil file queryvaliddata
    //Returns a list of the lengths of the data areas with "no valid data" - can only be one, AFAICT
    public List<long> QueryFileRegions(string filePath, long offset, long length)
    {
        var list = new List<long>();
        //In procmon we can see fsutil doesnt end up with an option "non directory file" passed to CreateFile.
        //Further tracing with API Monitor revealed fsutil's call to CreateFIle, ends up at NtCreateFile, with an Option parameter of FILE_OPEN_FOR_BACKUP_INTENT
        //Passing BackupSemantics here seems to get the trace in procmon looking the same. No "Non Directory File" option
        SafeFileHandle fileHandle = new SafeFileHandle(Kernel32_h.CreateFile(filePath,
            EFileAccess.FILE_READ_DATA | EFileAccess.FILE_READ_ATTRIBUTES | EFileAccess.SYNCHRONIZE,
            EFileShare.Read | EFileShare.Write,
            IntPtr.Zero,
            ECreationDisposition.OpenExisting,
            EFileAttributes.BackupSemantics, IntPtr.Zero), true);
    
        if (Marshal.GetLastWin32Error() != 0)
            throw new IOException("Could not open file - " + filePath, Marshal.GetExceptionForHR(Marshal.GetHRForLastWin32Error()));
    
        using (fileHandle)
        {
            var sizeOutputAndInfo = Marshal.SizeOf(typeof(FILE_REGION_OUTPUT)); // 40. Output header, and first info struct.
            var sizeInfo = Marshal.SizeOf(typeof(FILE_REGION_INFO));
    
            int nInBufferSize;
            IntPtr lpInBuffer;
            if (length == 0 && offset == 0)
            {
                nInBufferSize = 0;
                lpInBuffer = IntPtr.Zero;
            }
            else
            {
                FILE_REGION_INPUT regionInput;
                regionInput.Length = length;
                regionInput.FileOffset = offset;
                regionInput.DesiredUsage = FILE_REGION_USAGE_VALID_CACHED_DATA | FILE_REGION_USAGE_VALID_NONCACHED_DATA;//01 is ntfs only and 02 is refs only. I dont know what they mean.
                regionInput.Reserved = 0;
    
                nInBufferSize = Marshal.SizeOf(typeof(FILE_REGION_INPUT));
                lpInBuffer = Marshal.AllocHGlobal(nInBufferSize);
                Marshal.StructureToPtr(regionInput, lpInBuffer, false);
            }
    
            int nOutBufferSize = sizeOutputAndInfo + (sizeInfo * 1024);
            IntPtr lpOutBuffer = Marshal.AllocHGlobal(nOutBufferSize);
            uint bytesReturned;
    
            try
            {
                bool success = Kernel32_h.DeviceIoControl(fileHandle,
                    FSCTL_QUERY_FILE_REGIONS,
                    lpInBuffer, (uint)nInBufferSize,
                    lpOutBuffer, (uint)nOutBufferSize,
                    out bytesReturned, IntPtr.Zero);
    
                if (success)
                {
                    FILE_REGION_OUTPUT result = new FILE_REGION_OUTPUT();
                    result = (FILE_REGION_OUTPUT)Marshal.PtrToStructure(lpOutBuffer, typeof(FILE_REGION_OUTPUT));
                    //There is only ever going to be 2 regions. And if there are 2 "RegionEntryCount", the first one is never going to be NVD
                    if (result.Region.DesiredUsage == 0)
                        list.Add(result.Region.Length);
    
                    //First of the next records starts at +40 bytes.
                    IntPtr lpNextRegion = new IntPtr(lpOutBuffer.ToInt64() + sizeOutputAndInfo);
                    var bytesRead = sizeOutputAndInfo;
                    if (bytesReturned > sizeOutputAndInfo)
                    {
                        FILE_REGION_INFO nextRegion = (FILE_REGION_INFO)Marshal.PtrToStructure(lpNextRegion, typeof(FILE_REGION_INFO));
                        if (result.Region.DesiredUsage == 0)
                            list.Add(nextRegion.Length);
                        //Untsted code, as I know no way that the system would return more than 2 regions.
                        //If you write one byte into the middle of an NVD file, the system jhust fill the start with VD.
                        /*bytesRead += sizeInfo;
                        while (bytesRead < bytesReturned) {
                            lpNextRegion = new IntPtr(lpOutBuffer.ToInt64() + sizeInfo);
                            bytesRead += sizeInfo;
                            nextRegion = (FILE_REGION_INFO)Marshal.PtrToStructure(lpNextRegion, typeof(FILE_REGION_INFO));
                            list.Add(nextRegion.Length);                            
                        }*/
                    }
                }
                else
                {
                    throw new IOException("Could not read file regions - " + filePath, Marshal.GetExceptionForHR(Marshal.GetHRForLastWin32Error()));
                }
            }
            finally
            {               
                Marshal.FreeHGlobal(lpInBuffer);
                Marshal.FreeHGlobal(lpOutBuffer);
            }
        }   
        return list;
    }
}

public class Kernel32_h
{

    [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
    public static extern bool DeviceIoControl(SafeFileHandle hDevice, uint dwIoControlCode,
        IntPtr lpInBuffer, uint nInBufferSize, IntPtr lpOutBuffer, uint nOutBufferSize,
        out uint lpBytesReturned, IntPtr lpOverlapped);

    [DllImport("kernel32.dll", SetLastError = true)]
    public static extern IntPtr CreateFile(
        string lpFileName,
        EFileAccess dwDesiredAccess,
        EFileShare dwShareMode,
        IntPtr lpSecurityAttributes,
        ECreationDisposition dwCreationDisposition,
        EFileAttributes dwFlagsAndAttributes,
        IntPtr hTemplateFile);
}
Antichlor answered 19/4 at 1:16 Comment(0)
C
-2

The SetValidData (according to MSDN) can be used to create for example a large file without having to write to the file. For a database this will allocate a (contiguous) storage area.

As a result, it seems the file size on disk will have changed without any data having been written to the file.

By implication, any GetValidData (which does not exist) just returns the size of the file, so you can use GetFileSize which returns the "valid" file size.

Cornea answered 23/2, 2016 at 9:18 Comment(3)
GetFileSize returns the size by the EOF position, not the VDLMcgowen
The EOF position IS the file size and IS the VDL.Cornea
Please read my links in the question, you will see that it is not the caseMcgowen

© 2022 - 2024 — McMap. All rights reserved.