In fact the MftValidDataLength
field of the NTFS_VOLUME_DATA_BUFFER
/ NTFS_EXTENDED_VOLUME_DATA
structure(s) place an upper limit on the number of USN records that will/would be returned by FSCTL_ENUM_USN_DATA
(that is, assuming additional records aren't added to the journal between the time that you measure the estimate and the enumeration...)
In the C# example below, I divide the vd.MftValidDataLength
value by vd.BytesPerFileRecordSegment
, being sure to round-up by first adding dividend - 1
before dividing. As for the divisor, I believe that its value here is always universally 1,024
on any platform or system, in case you prefer to hard-code it.
[Serializable, StructLayout(LayoutKind.Sequential)]
public struct NTFS_EXTENDED_VOLUME_DATA
{
public VOLUME_ID /**/ VolumeSerialNumber;
public long /**/ NumberSectors;
public long /**/ TotalClusters;
public long /**/ FreeClusters;
public long /**/ TotalReserved;
public uint /**/ BytesPerSector;
public uint /**/ BytesPerCluster;
public int /**/ BytesPerFileRecordSegment; // <--
public uint /**/ ClustersPerFileRecordSegment;
public long /**/ MftValidDataLength; // <--
public long /**/ MftStartLcn;
public long /**/ Mft2StartLcn;
public long /**/ MftZoneStart;
public long /**/ MftZoneEnd;
public uint /**/ ByteCount;
public ushort /**/ MajorVersion;
public ushort /**/ MinorVersion;
public uint /**/ BytesPerPhysicalSector;
public ushort /**/ LfsMajorVersion;
public ushort /**/ LfsMinorVersion;
public uint /**/ MaxDeviceTrimExtentCount;
public uint /**/ MaxDeviceTrimByteCount;
public uint /**/ MaxVolumeTrimExtentCount;
public uint /**/ MaxVolumeTrimByteCount;
};
Typical constants, abridged for clarity:
public enum FSCTL : uint
{
// etc... etc...
FILESYSTEM_GET_STATISTICS /**/ = (9 << 16) | 0x0060,
GET_NTFS_VOLUME_DATA /**/ = (9 << 16) | 0x0064, // <--
GET_NTFS_FILE_RECORD /**/ = (9 << 16) | 0x0068,
GET_VOLUME_BITMAP /**/ = (9 << 16) | 0x006f,
GET_RETRIEVAL_POINTERS /**/ = (9 << 16) | 0x0073,
// etc... etc...
ENUM_USN_DATA /**/ = (9 << 16) | 0x00b3,
READ_USN_JOURNAL /**/ = (9 << 16) | 0x00bb,
// etc... etc...
CREATE_USN_JOURNAL /**/ = (9 << 16) | 0x00e7,
// etc... etc...
};
Pseudo-code follows, since everyone has their own favorite ways of doing P/Invoke...
// etc..
if (!GetDeviceIoControl(h_vol, FSCTL.GET_NTFS_VOLUME_DATA, out NTFS_EXTENDED_VOLUME_DATA vd))
throw new Win32Exception(Marshal.GetLastWin32Error());
var c_mft_estimate = (vd.MftValidDataLength + (vd.BytesPerFileRecordSegment - 1))
/ vd.BytesPerFileRecordSegment;
Great, so what can you do with this value? Unfortunately, knowing this maximum cap on the number of USN records that FSCTL_ENUM_USN_DATA
will return doesn't help with choosing a buffer size for the DeviceIoControl/FSCTL_ENUM_USN_DATA
call themselves, since the USN_RECORD
structures returned in each iteration vary in size according to the length of the reported filenames.
So while it is true that, if you happen to provide a buffer large enough for all of the USN_RECORD
structures, then DeviceIoControl
will indeed dutifully provide them all to you in a single call (thus avoiding the complication of an iterative-calling loop, which simplifies the code considerably), the little calculation above doesn't give any principled estimation of that buffer size, unless you're willing to settle for using it towards some kind of gross overestimation.
What the value is useful for, rather, is for pre-allocating your own fixed-size data structures, which you'll surely need, prior to the FSCTL_ENUM_USN_DATA
enumeration operation. So if you have your own value-type which you'll create for each USN entry (dummy struct, just for example...)
[StructLayout(LayoutKind.Sequential)]
public struct MFT_IX_REC
{
public ushort seq;
public ushort parent_ix_hi;
public uint parent_ix;
};
Then, using the estimate from above, you can pre-allocate an array of these before the DeviceIoControl
and never have to worry about resizing during the iteration.
var med = new MFT_ENUM_DATA { ... };
// ...
var rg_mftix = new MFT_IX_REC[c_mft_estimate];
// ... ready to go, without having to check whether the array needs resizing within the loop
for (int i=0; DeviceIoControl(h_vol, FSCTL.ENUM_USN_DATA, in med, out USN_RECORD usn, ...); i++)
{
// etc..
rg_mftix[i].parent_ix = (uint)usn.ParentId;
// etc..
}
This elimination of the dynamic array-resizing, usually needed when you don't know the number of entries in advance, is a non-trivial performance benefit, because it avoids the expensive jumbo-sized memcpy
operations required for copying the existing data from the old array to a new, larger one each time you resize.