Why file enumeration using DeviceIoControl is faster in VB.NET than in C++?
Asked Answered
T

1

0

I am trying to read windows Master File Table (MFT) for fast enumeration of files. Till now I have seen two approaches to do this:

  1. As suggested by Jeffrey Cooperstein and Jeffrey Richter using DeviceIoControl
  2. Direct parsing of MFT as presented in some opensource tools and An NTFS Parser Lib

For my project I am focusing on the approach [1]. The problem I am facing is mostly related to execution time. Just to be clear, following is my system and development enviornment:

  1. IDE - Visual Studio 2013
  2. Language - C++
  3. OS - Windows 7 Professional x64
  4. 32 Bit binaries are generated for C++ and .NET code.

Problem

I have compared the version mentioned in [1] (slightly modified) with a VB.NET implementation available on codeplex. The issue is if I uncomment the statement in Inner Loop the C++ code execution time increases by a factor of 7-8x. I haven't implemented the path matching in C++ code (which is available in the VB code).

Q1. Kindly suggest how to improve the performance of the C++ code.

Timings for enumerating C:\ drive on my machine:

  1. C++ (with uncommented statement in inner loop) - 21 seconds
  2. VB.NET (with additional path matching code) - 3.5 seconds

For more clarity following is the C++ and VB.NET snippets.

C++

bool FindAll()
{
    if (m_hDrive == NULL) // Handle of, for example, "\\.\C:"
        return false;

    USN_JOURNAL_DATA ujd = {0};
    DWORD cb = 0;
    BOOL bRet = FALSE;
    MFT_ENUM_DATA med = {0};

    BYTE pData[sizeof(DWORDLONG) + 0x10000] = {0};

    bRet = DeviceIoControl(m_hDrive, FSCTL_QUERY_USN_JOURNAL, NULL, 0, &ujd, sizeof(USN_JOURNAL_DATA), &cb, NULL);
    if (bRet == FALSE) return false;

    med.StartFileReferenceNumber = 0;
    med.LowUsn = 0;
    med.HighUsn = ujd.NextUsn;

    //Outer Loop
    while (TRUE)
    {
        bRet = DeviceIoControl(m_hDrive, FSCTL_ENUM_USN_DATA, &med, sizeof(med), pData, sizeof(pData), &cb, NULL);
        if (bRet == FALSE) {
            break;
        }

        PUSN_RECORD pRecord = (PUSN_RECORD)&pData[sizeof(USN)];

        //Inner Loop
        while ((PBYTE)pRecord < (pData + cb))
        {
            tstring sz((LPCWSTR) ((PBYTE)pRecord + pRecord->FileNameOffset), pRecord->FileNameLength / sizeof(WCHAR));

            bool isFile = ((pRecord->FileAttributes & FILE_ATTRIBUTE_DIRECTORY) != FILE_ATTRIBUTE_DIRECTORY);
            if (isFile) m_dwFiles++;
            //m_nodes[pRecord->FileReferenceNumber] = new CNode(pRecord->ParentFileReferenceNumber, sz, isFile);

            pRecord = (PUSN_RECORD)((PBYTE)pRecord + pRecord->RecordLength);
        }
        med.StartFileReferenceNumber = *(DWORDLONG *)pData;
    }
    return true;
}

Where m_nodes is defined as typedef std::map<DWORDLONG, CNode*> NodeMap;

VB.NET

Public Sub FindAllFiles(ByVal szDriveLetter As String, fFileFound As FileFound_Delegate, fProgress As Progress_Delegate, fMatch As IsMatch_Delegate)

        Dim usnRecord As USN_RECORD
        Dim mft As MFT_ENUM_DATA
        Dim dwRetBytes As Integer
        Dim cb As Integer
        Dim dicFRNLookup As New Dictionary(Of Long, FSNode)
        Dim bIsFile As Boolean

        ' This shouldn't be called more than once.
        If m_Buffer.ToInt32 <> 0 Then
            Console.WriteLine("invalid buffer")
            Exit Sub
        End If

        ' progress 
        If Not IsNothing(fProgress) Then fProgress.Invoke("Building file list.")

        ' Assign buffer size
        m_BufferSize = 65536 '64KB

        ' Allocate a buffer to use for reading records.
        m_Buffer = Marshal.AllocHGlobal(m_BufferSize)

        ' correct path
        szDriveLetter = szDriveLetter.TrimEnd("\"c)

        ' Open the volume handle 
        m_hCJ = OpenVolume(szDriveLetter)

        ' Check if the volume handle is valid.
        If m_hCJ = INVALID_HANDLE_VALUE Then
            Console.WriteLine("Couldn't open handle to the volume.")
            Cleanup()
            Exit Sub
        End If

        mft.StartFileReferenceNumber = 0
        mft.LowUsn = 0
        mft.HighUsn = Long.MaxValue

        Do
            If DeviceIoControl(m_hCJ, FSCTL_ENUM_USN_DATA, mft, Marshal.SizeOf(mft), m_Buffer, m_BufferSize, dwRetBytes, IntPtr.Zero) Then
                cb = dwRetBytes
                ' Pointer to the first record
                Dim pUsnRecord As New IntPtr(m_Buffer.ToInt32() + 8)

                While (dwRetBytes > 8)
                    ' Copy pointer to USN_RECORD structure.
                    usnRecord = Marshal.PtrToStructure(pUsnRecord, usnRecord.GetType)

                    ' The filename within the USN_RECORD.
                    Dim FileName As String = Marshal.PtrToStringUni(New IntPtr(pUsnRecord.ToInt32() + usnRecord.FileNameOffset), usnRecord.FileNameLength / 2)

                    'If Not FileName.StartsWith("$") Then
                    ' use a delegate to determine if this file even matches our criteria
                    Dim bIsMatch As Boolean = True
                    If Not IsNothing(fMatch) Then fMatch.Invoke(FileName, usnRecord.FileAttributes, bIsMatch)

                    If bIsMatch Then
                        bIsFile = Not usnRecord.FileAttributes.HasFlag(FileAttribute.Directory)
                        dicFRNLookup.Add(usnRecord.FileReferenceNumber, New FSNode(usnRecord.FileReferenceNumber, usnRecord.ParentFileReferenceNumber, FileName, bIsFile))
                    End If
                    'End If

                    ' Pointer to the next record in the buffer.
                    pUsnRecord = New IntPtr(pUsnRecord.ToInt32() + usnRecord.RecordLength)

                    dwRetBytes -= usnRecord.RecordLength
                End While

                ' The first 8 bytes is always the start of the next USN.
                mft.StartFileReferenceNumber = Marshal.ReadInt64(m_Buffer, 0)

            Else

                Exit Do

            End If

        Loop Until cb <= 8

        If Not IsNothing(fProgress) Then fProgress.Invoke("Parsing file names.")

        ' Resolve all paths for Files
        For Each oFSNode As FSNode In dicFRNLookup.Values.Where(Function(o) o.IsFile)
            Dim sFullPath As String = oFSNode.FileName
            Dim oParentFSNode As FSNode = oFSNode

            While dicFRNLookup.TryGetValue(oParentFSNode.ParentFRN, oParentFSNode)
                sFullPath = String.Concat(oParentFSNode.FileName, "\", sFullPath)
            End While
            sFullPath = String.Concat(szDriveLetter, "\", sFullPath)

            If Not IsNothing(fFileFound) Then fFileFound.Invoke(sFullPath, 0)
        Next

        '// cleanup
        Cleanup() '//Closes all the handles
        If Not IsNothing(fProgress) Then fProgress.Invoke("Complete.")
    End Sub

Where fFileFound is defined as follows:

Sub(s, l)
    If s.ToLower.StartsWith(sSearchPath) Then
        lCount += 1
        lstFileNames.Add(s.ToLower) '// Dim lstFileNames As List(Of String)
    End If
End Sub

Where FSNode & CNode has the following structure:

//C++ version
class CNode
{
public:
    //DWORDLONG m_dwFRN;
    DWORDLONG m_dwParentFRN;
    tstring m_sFileName;
    bool m_bIsFile;

public:
    CNode(DWORDLONG dwParentFRN, tstring sFileName, bool bIsFile = false) : 
        m_dwParentFRN(dwParentFRN), m_sFileName(sFileName), m_bIsFile(bIsFile){
    }
    ~CNode(){
    }
};

Note - The VB.NET code spawns a new thread (needed as it has GUI), whereas, I am calling the c++ function in the main thread (a simple console application for testing).


Update

It was a silly mistake from my side. The DeviceIoControl API is working as expected. Though the Debug build is a bit slower than the Release build. Refer to the following article:

how-can-i-increase-the-performance-in-a-map-lookup-with-key-type-stdstring

Tugman answered 10/12, 2014 at 5:37 Comment(6)
Can you present simplified programs that just compare the calls to DeviceIoControl. When you do that you'll presumably find that the C++ code is faster because it avoids the p/invoke layer.Enedina
@David Heffernan - Thanks. It was a silly mistake from my side. I am using C++ STL containers classes in my project and they tend to run slow in DEBUG mode. Once tested the Release build the performance is as expected.Tugman
Using sample (which you mentioned in CODEPLEX),whether you can able to get the createdDate and ModiifedDate?Lightening
Excellent question, with background on the different possible approaches, etc. I don't understand why the downvote...Epsom
@Tugman can you post your latest achievements / thoughts about speed of parsing MFT / NTFS ? Would be super interesting!Epsom
@Tugman you might have some ideas for this?Epsom
S
1

I didn't run your code, but since you say the commented line is the issue, the problem is probably the map insertion. In the C++ code, you are using a std::map, which is implemented as a tree (sorted by key, log(n) access time). In the VB code, you are using a Dictionary, which is implemented as a hash table (no sorting, constant access time). Try using a std::unordered_map in the C++ version.

Sale answered 10/12, 2014 at 10:32 Comment(1)
Thanks. Please see my update. Even a simple tstring object creation reduces the time in Debug build. Where tstring --> typedef std::basic_string<TCHAR> tstring;Tugman

© 2022 - 2024 — McMap. All rights reserved.