In Best Practices for Using Strings in the .NET Framework, StringComparison OrdinalIgnoreCase
is recommended for case-insensitive file paths. (Let's call it Statement A.)
I can agree with that, because I can create two files in the same directory:
é.txt
é.txt
Their filenames are not the same, second one is composed from e
and modifier, so it actually has two letters. (You can try yourself using copy-paste.)
If there was Invariant culture comparison (and not ordinal comparison) in effect, NTFS wouldn't allow these files, because in the same article they explain, that in invariant culture a + ̊ = å
But in article on String.ToUpperInvariant()
there is different recommendation: (Statement B.)
If you need the lowercase or uppercase version of an operating system identifier, such as a file name, named pipe, or registry key, use the ToLowerInvariant or ToUpperInvariant methods.
I need to create file path collection (in fact HashSet
) to detect duplicates. So if I will obey statement B when creating the map, I could end with false positives, because abovementioned filenames é.txt
and é.txt
will be considered as one. Am I understanding it correctly that statement B found in MSDN is misleading? Or am I missing something?
I'm about to build a library, preferably without known bugs from start, so I simply don't want to neglect this.
Update:
Statement B seems to have one more issue: ToLowerInvariant() cannot be actually used. Reason (I quote Best practices article): DO: Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison.
Actual reason: There is a small range of characters that do not roundtrip, and going to lowercase will make these characters unavailable.
(source)
:
,*
or?
in file names. It's just Windows that doesn't support it. It's quite easy to create such files on NTFS under Linux. – Werner