Speed up NTFS file enumeration (using FSCTL_ENUM_USN_DATA and NTFS MFT / USN journal)
Asked Answered
G

0

7

I'm enumerating the files of a NTFS hard drive partition, by looking at the NTFS MFT / USN journal with:

HANDLE hDrive = CreateFile(szVolumePath, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, NULL, NULL);
DWORD cb = 0;

MFT_ENUM_DATA med = { 0 };
med.StartFileReferenceNumber = 0;
med.LowUsn = 0;
med.HighUsn = MAXLONGLONG;      // no change in perf if I use med.HighUsn = ujd.NextUsn; where "USN_JOURNAL_DATA ujd" is loaded before

unsigned char pData[sizeof(DWORDLONG) + 0x10000] = { 0 }; // 64 kB

while (DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, &med, sizeof(med), pData, sizeof(pData), &cb, NULL))
{
        med.StartFileReferenceNumber = *((DWORDLONG*) pData);    // pData contains FRN for next FSCTL_ENUM_USN_DATA

       // here normaly we should do: PUSN_RECORD pRecord = (PUSN_RECORD) (pData + sizeof(DWORDLONG)); 
       // and a second loop to extract the actual filenames
       // but I removed this because the real performance bottleneck
       // is DeviceIoControl(m_hDrive, FSCTL_ENUM_USN_DATA, ...)
}

It works, it is much faster than usual FindFirstFile enumeration techniques. But I see it's not optimal yet:

  • On my 700k files C:\, it takes 21 sec. (This measure has to be done after reboot, if not, it will be incorrect because of caching).

  • I have seen another indexing software (not Everything, another one) able to index C:\ in < 5 seconds (measured after Windows startup), without reading a pre-calculated database in a .db file (or other similar tricks that could speed up things!). This software does not use FSCTL_ENUM_USN_DATA, but low-level NTFS parsing instead.

What I've tried to improve performance:

Question:

Is it possible to improve performance DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, ...)?

or is the only way to improve performance to do low-level manual parsing of NTFS?


Note: According to tests, the total size to be read during these DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, ...) for my 700k files is only 84MB. 21 second to read 84MB is only 4 MB/sec (and I do have a SSD!). There is probably some room for performance improvement, don't you think so?

Glantz answered 19/7, 2017 at 1:43 Comment(7)
The most obvious thing to try is to increase the buffer size to reduce the number of round trips. But I don't think it will boost performance much, the bottleneck is most likely in converting the records from whatever the underlying format is to the USN_RECORD_Vx structure.Chorion
Thanks @HarryJohnston. I have tried 4kb buffer size, 64kb, 1MB (with #pragma comment(linker, "/STACK:2000000")), and even with a 100 MB malloc-ed array, and it's the same : ~ 21 seconds.Glantz
You're probably right @HarryJohnston, the bottleneck seems to be the conversion to the USN_RECORD_Vx structure. Do you think there's a way to force-choosing a "lighter conversion" (I don't care about many information, I just need: FileReferenceNumber, ParentFolder, filename)? I don't see this possible here: msdn.microsoft.com/en-us/library/windows/desktop/…Glantz
"This software does not use FSCTL_ENUM_USN_DATA, but low-level NTFS parsing instead." - Isn't that the answer to your question?Clutch
@Clutch I was hoping there was some solution in the middle between 1. FSCTL_ENUM_USN_DATA that has "ok performance" and is easy to code and 2. Super mega fast low-level NTFS parsing that would require one week fulltime to make it work... Do you think this middle point exist somewhere ?Glantz
The journal is a sparse file, and I don't know what the implications are for reading a range of rows that just don't exist. It's possible that you're spending all your time reading vast empty space...or triggering something grossly inefficient to get past the empty spaces. What I do...and I do it in c#...and it takes just a few seconds...is a call FSCTL_QUERY_USN_JOURNAL to get the extents of valid records...into a USN_JOURNAL_DATA_V1 structure...and use that data to seed the first call to FSCTL_ENUM_USN_DATA. Maybe that'll help...dunno.Mantilla
As Basj said, this issue (DeviceIoControl slow) happens only the very first time you run the FSCTL_ENUM_USN_DATA scan (for example, after a reboot). I wrote code (C#) to read the USN journal to get all the fies in a volume, and I noticed that this scan is very fast on Win10 even at the first launch (2 seconds to get and sort 900k files), but slow on Win7 (first run only). @Clay: if you experimented something different, why you don't show the code you used to have those fast accesses? Please share with us. Thanks.Overcloud

© 2022 - 2024 — McMap. All rights reserved.