I'm enumerating the files of a NTFS hard drive partition, by looking at the NTFS MFT / USN journal with:
HANDLE hDrive = CreateFile(szVolumePath, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, NULL, NULL);
DWORD cb = 0;
MFT_ENUM_DATA med = { 0 };
med.StartFileReferenceNumber = 0;
med.LowUsn = 0;
med.HighUsn = MAXLONGLONG; // no change in perf if I use med.HighUsn = ujd.NextUsn; where "USN_JOURNAL_DATA ujd" is loaded before
unsigned char pData[sizeof(DWORDLONG) + 0x10000] = { 0 }; // 64 kB
while (DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, &med, sizeof(med), pData, sizeof(pData), &cb, NULL))
{
med.StartFileReferenceNumber = *((DWORDLONG*) pData); // pData contains FRN for next FSCTL_ENUM_USN_DATA
// here normaly we should do: PUSN_RECORD pRecord = (PUSN_RECORD) (pData + sizeof(DWORDLONG));
// and a second loop to extract the actual filenames
// but I removed this because the real performance bottleneck
// is DeviceIoControl(m_hDrive, FSCTL_ENUM_USN_DATA, ...)
}
It works, it is much faster than usual FindFirstFile
enumeration techniques. But I see it's not optimal yet:
On my 700k files
C:\
, it takes 21 sec. (This measure has to be done after reboot, if not, it will be incorrect because of caching).I have seen another indexing software (not Everything, another one) able to index
C:\
in < 5 seconds (measured after Windows startup), without reading a pre-calculated database in a .db file (or other similar tricks that could speed up things!). This software does not useFSCTL_ENUM_USN_DATA
, but low-level NTFS parsing instead.
What I've tried to improve performance:
Open file with another flag, like
FILE_FLAG_SEQUENTIAL_SCAN
,FILE_FLAG_RANDOM_ACCESS
, orFILE_FLAG_NO_BUFFERING
: same result: 21 seconds to readLooking at Estimate the number of USN records on NTFS volume, Why file enumeration using DeviceIoControl is faster in VB.NET than in C++? I have studied them in depth but it doesn't provide an answer to this actual question.
Test another compiler: MinGW64 instead of VC++ Express 2013: same performance result, no difference
On VC++, I already have switched to
Release
instead ofDebug
: are there other Project Properties/Options that could speed up the progam?
Question:
Is it possible to improve performance DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, ...)
?
or is the only way to improve performance to do low-level manual parsing of NTFS?
Note: According to tests, the total size to be read during these DeviceIoControl(hDrive, FSCTL_ENUM_USN_DATA, ...)
for my 700k files is only 84MB. 21 second to read 84MB is only 4 MB/sec (and I do have a SSD!). There is probably some room for performance improvement, don't you think so?
#pragma comment(linker, "/STACK:2000000")
), and even with a 100 MBmalloc
-ed array, and it's the same : ~ 21 seconds. – GlantzFSCTL_ENUM_USN_DATA
, but low-level NTFS parsing instead." - Isn't that the answer to your question? – ClutchFSCTL_ENUM_USN_DATA
that has "ok performance" and is easy to code and 2. Super mega fast low-level NTFS parsing that would require one week fulltime to make it work... Do you think this middle point exist somewhere ? – GlantzFSCTL_QUERY_USN_JOURNAL
to get the extents of valid records...into aUSN_JOURNAL_DATA_V1
structure...and use that data to seed the first call toFSCTL_ENUM_USN_DATA
. Maybe that'll help...dunno. – Mantilla