How to deal with Windows' ReadDirectoryChangesW() and its mixed long/short filename output?
Asked Answered
C

1

9

I am developing a piece of C code that uses ReadDirectoryChangesW() to monitor changes under a directory in Windows. I have read the related MSDN entries for ReadDirectoryChangesW() and the FILE_NOTIFY_INFORMATION structure, as well as several other pieces of documentation. At this point I have managed to monitor multiple directories with no apparent problems in the monitoring itself. The problem is that the filenames put in the FILE_NOTIFY_INFORMATION structure by this function are not canonical.

According to MSDN they can be in either long or short form. I have found several posts which suggest caching both short and long pathnames to handle this case. Unfortunately, according to my own testing on a Windows 7 system this is not sufficient to eliminate the issue, because there are not just two alternatives for each filename. The problem is that in a pathname EACH COMPONENT can be in either long or short form. The following pathnames could all refer to the same file:

c:\PROGRA~1\MYPROG~1\MYDATA~1.TXT

c:\PROGRA~1\MYPROG~1\MyDataFile.txt

c:\PROGRA~1\MyProgram\MYDATA~1.TXT

c:\PROGRA~1\MyProgram\MyDataFile.txt

c:\Program Files\MYPROG~1\MYDATA~1.TXT

...

and as far as I can tell from my testing using cmd.exe they are all perfectly acceptable. Essentially, the number of valid pathnames for each file rises exponentialy with the number of components in its pathname.

Unfortunately, ReadDirectoryChangesW() seems to fill in its output buffer with the filenames as provided to the system call that causes each operation. For example if you use cmd.exe commands to create, rename, delete e.t.c. files, the FILE_NOTIFY_INFORMATION will contain the filenames as specified at the command line.

Now, in most cases I could use GetLongPathName() and friends to get a unique path for my use. Unfortunately that cannot be done when deleting files - by the time I get the notification, the file is already gone and the Get*PathName() functions will not work.

At the moment I am thinking about using more extensive caching to determine which alternative pathnames are used by applications for each file, which would handle any case, except for the one where someone decides to delete a file out of the blue using an unseen mixed pathname. And I am thinking about creative data mining from the parent directory modification events and falling back to checking the actual directory for that case.

Any suggestions for an easier way to do this ?

PS1: While Change Journals would deal with this effectively (I hope) I do not believe I can use them, due to their ties to NTFS and the lack of administrator priviledges for my application. I'd rather not go there, unless I am absolutely forced to.

PS2: Please, keep in mind that I code mainly on Unix, so be gentle...

Cusk answered 14/11, 2010 at 19:38 Comment(4)
If all else fails, maybe a minifilter driver will work?Pisgah
I believe that's what most antivirus programs do and I suppose it would be THE solution for this issue. Unfortunately it requires system administrator rights to install, and much like Change Journals it would make the architecture of my application a LOT more complex, since I would have to consider security and stability issues I do not have to deal with now. And let's not forget the inherent difficulties of writing a kernel-mode or quasi-kernel-mode driver for any OS.Cusk
Oh, and at the moment it does seem a bit of an overkill to monitor everything just in order to watch a couple of user directories. Thanks for the suggestion though...Cusk
The change-notification API in Windows is a disgrace. It's also got several race conditions. Even Explorer can't get it right. (In fact, Explorer in Win7 seems to have some new bugs in that area with command-prompt programs modifying files. :))Ligature
I
2

You don't need to cache every combination. It will do if you cache each subpath to be able to convert it to the long form. for example store this:

  • C:\PROGRA~1 => c:\Program Files
  • c:\Program Files\MYPROG~1 => c:\Program Files\MyProgram
  • c:\Program Files\MyProgram\MYDATA~1.TXT => c:\Program Files\MyProgram\MyDataFile.txt
  • c:\Program Files\MyProgram\MYDATA~2.TXT => c:\Program Files\MyProgram\MyDataFile2.txt

Now if you get a notification of c:\PROGRA~1\MYPROG~1\MYDATA~1.TXT, split it at every \, and lookup each part for it's long form.

Don't forget that MyDataFile.txt and MYDATAFILE.TXT also point to the same file. So compare case-insensitive or convert everything to uppercase.

And if c:\PROGRA~1\MYPROG~1\MYDATA~1.TXT is deleted, you might still use GetLongPathName() on c:\PROGRA~1\MYPROG~1.

Impaction answered 14/11, 2010 at 21:26 Comment(1)
When I mentioned extensive caching and data mining I had in mind a slightly more enhanced version of what you propose. Right now I am storing both short and long names when available and when a path has been seen I can already find out all alternative pathnames. Using a c:\A\B\C -> c:\a\b\c association I can detect all of the c:\{A,a}\{B,b}\{C,c} combinations. There still exists an issue with deletions of unseen files. I am thinking about recursively storing an initial state of the directory in order to deal with this - not sure if I can avoid it anyway...Cusk

© 2022 - 2024 — McMap. All rights reserved.