Finding a set of file names quickly on NTFS volumes, ideally via its MFT
Asked Answered
C

1

5

I am in the middle of writing a tool that finds lost files of an iTunes library, for both Mac and Windows. On the Mac, I can quickly find files by naming using the wonderful "CatalogSearch" function.

On Windows, however, there seems to be no OS API for searching by file name (or is there?).

After some googling, I learned that there are tools (like TFind, Everything) that read the NTFS directory directly and scan it to find files by name.

I would like to do the same, but without having to start from scratch (although I've written quite a few disk tools in the past, I've never had the energy to dig into NTFS).

I wonder if there are ready-made libs around, possibly as a .dll, that would give me this search feature: Pass in a file name, get back its path.

Alternatively, what about the Windows indexing service? At least when I tried this on a recently installed XP Home system, the Search operation under the Start menu would actually scan all directories, which suggests that it has no complete database. As I'm not a Windows user at all, I wonder why this isn't working.

In the end, the complete solution I need is: I have a list of file names to find, and I need code that searches the entire disk (or uses a DB for it) to get me all results in one go. E.g, the search should not start a new full scan for every file I'm looking up. That's why I think the MFT way would be optimal, as it could quickly iterate over all names, comparing each to my list.

Chthonian answered 17/11, 2010 at 10:44 Comment(4)
Windows Search is quick only if you're searching indexed locations.Pasteur
I guess you mean this: msdn.microsoft.com/en-us/library/bb266517(v=VS.85).aspx?ppud=4 -- looks complicated. I'll give it a closer look, thanks.Chthonian
Do not do this, please please please. Listen to the guy who tells you to use the USN JournalLaminar
Alright. You persuaded me. Now, you'd even convince me if you'd tell me why the Windows Search is not such a good idea. Maybe because it won't find everything? (mind you, I'm the author of "Find Any File" for OS X, in case you ever need to find everything on a Mac :)Chthonian
P
6

The best way to solve your problem seems to be by using the Windows Change Journal.

Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.

You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

Preliminary answered 22/11, 2010 at 9:42 Comment(10)
Yes, I am aware of this option (it's also available on OS X by default since 10.5). But that's too complicated to handle, I fear.Chthonian
And the change journal only gives me the recent changes, right? So if I do not keep a process running recording every change, I will still have to do a full scan first. Correct? Then I'm back to my original question: How do I do a fast full scan?Chthonian
If you set StartUSN to zero as described this gives you all files on the volume in a fast way (And it is really fast). If you want changes you have to set StartUSN to a higher number. Then you get the changed files since that USN.Preliminary
Sorry. It is FSCTL_ENUM_USN_DATA and not FSCTL_QUERY_USN_JOURNAL - my bad.Preliminary
Ah, then the "journal" actually does more than just journalling it seems (contrary to OS X's function which only tells you of changes while listening). Thanks for clarifying. I'll look into this then. I'm all about using the options available, while resorting to slow processes otherwise.Chthonian
I don't think you need the Change Journal enabled to use FSCTL_ENUM_USN_DATA. There's a separate ioctl for change tracking, FSCTL_READ_USN_JOURNAL, which is probably more similar to the OSX journal you've used before, although the NTFS one is more like a closed-caption security tape: your process doesn't have to be running when the change occurs as long as you query the journal before it wraps around and gets overwritten.Desperado
Ben: That's a "closed-circuit" security tape. Closed captioning is optional subtitles on TV show. Anyway, I belive that FSCTL_ENUM_USN_DATA walks the MFT, returning USN records of matching files, while FSCTL_READ_USN_JOURNAL walks the USN journal, returning matching files (and possibly waiting until new records show up).Kendy
@Gabe: Yes, I meant "closed-circuit". That sometimes happens when I post comments too late at night.Desperado
+1 for this great information. And here is the link to the MSDN documentation on FSCTL_ENUM_USN_DATA: msdn.microsoft.com/en-us/library/aa364563%28VS.85%29.aspxIsologous
Have a look at these links: microsoft.com/msj/0999/journal/journal.aspx and technet.microsoft.com/en-us/library/bb742450.aspx. This 2 part series named "Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained" helped me kept my sanity when implementing change journal functionality.Lyndonlyndsay

© 2022 - 2024 — McMap. All rights reserved.