Weird functionality in .NET's Directory.GetFiles() when search pattern contains 3 chars for extension
Asked Answered
G

4

7

I recently bumped into a weird functionality from Microsoft:

Let's assume our folder c:\tmp123 contains 3 files -
1.txt
2.txtx
3.txtxt

a) Invocation of Directory.GetFiles(@"C:\tmp123", "*.txt") yields in 3 returned items.
b) Invocation of Directory.GetFiles(@"C:\tmp123", "*.txtx") yields in 1 returned items.

According to Microsoft this is the expected behavior (see Note in MSDN).

My questions are:

  1. Why did Microsoft decide to have such a strange functionality?

  2. How can I overcome this problem?
    i.e. how do I have a Search Pattern that would return *.txt extension only and not return *.txtx, *.txtstarngefunctionality, etc.?

Gangplank answered 10/1, 2012 at 8:49 Comment(0)
B
2

The reason for this is backwards compatibility.

Windows was initially built as a graphical interface on top of MSDOS which only had files with 8 characters for the name and a maximum of 3 for the extension. Extentions to the MSDOS file systems allowed Windows to have longer file names and extensions but these would still show up as 8.3 file names in MSDOS.

Since the command prompt on Windows is an evolution of the old command interpreter in MSDOS this means some "anachronistic" behaviours (like the 3 letter search pattern) were kept so applications and scripts built in the "old days" or by "old timers" wouldn't break.

(another example is the fact most windows file systems are case insensitive, yes, you guessed, because the MSDOS one didn't have casing)

Bourgogne answered 10/1, 2012 at 10:43 Comment(0)
T
2

If you want a workaround, you could simply retrieve all the file paths

var files = Directory.GetFiles(@"C:\tmp123");

and then filter them by extension as needed

var txtFiles = files.Where(f => f.EndsWith(".txt"));
var txtxFiles = files.Where(f => f.EndsWith(".txtx"));
Tranship answered 10/1, 2012 at 13:39 Comment(0)
T
0

I'd be willing to wager it's something to do with backward compatibility. I don't see this exact issue mentioned, but this Raymond Chen blogpost mentions a number of oddities in this area:

[...] some quirks of the FCB matching algorithm persist into Win32 because they have become idiom.

For example, if your pattern ends in .*, the .* is ignored. Without this rule, the pattern *.* would match only files that contained a dot, which would break probably 90% of all the batch files on the planet, as well as everybody's muscle memory, since everybody running Windows NT 3.1 grew up in a world where *.* meant all files.

As another example, a pattern that ends in a dot doesn't actually match files which end in a dot; it matches files with no extension. And a question mark can match zero characters if it comes immediately before a dot.

Thankless answered 10/1, 2012 at 8:49 Comment(0)
T
0

Here is another workaround that will help with filtering out files with extensions such as ".txtxt":

var Files = System.IO.Directory.GetFiles("*.txt").Where(item => item.Extension.ToString().ToLower() == ".txt");
Tallow answered 24/9, 2014 at 22:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.