ToUpperInvariant() – is MSDN wrong on its recommendation?

In Best Practices for Using Strings in the .NET Framework, StringComparison OrdinalIgnoreCase is recommended for case-insensitive file paths. (Let's call it Statement A.)

I can agree with that, because I can create two files in the same directory:

é.txt
é.txt

Their filenames are not the same, second one is composed from e and modifier, so it actually has two letters. (You can try yourself using copy-paste.)

If there was Invariant culture comparison (and not ordinal comparison) in effect, NTFS wouldn't allow these files, because in the same article they explain, that in invariant culture a + ̊ = å

But in article on String.ToUpperInvariant() there is different recommendation: (Statement B.)

If you need the lowercase or uppercase version of an operating system identifier, such as a file name, named pipe, or registry key, use the ToLowerInvariant or ToUpperInvariant methods.

I need to create file path collection (in fact HashSet) to detect duplicates. So if I will obey statement B when creating the map, I could end with false positives, because abovementioned filenames é.txt and é.txt will be considered as one. Am I understanding it correctly that statement B found in MSDN is misleading? Or am I missing something?

I'm about to build a library, preferably without known bugs from start, so I simply don't want to neglect this.

Update:

Statement B seems to have one more issue: ToLowerInvariant() cannot be actually used. Reason (I quote Best practices article): DO: Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison. Actual reason: There is a small range of characters that do not roundtrip, and going to lowercase will make these characters unavailable. (source)

Neither uppercasing nor lowercasing is correct when you want to compare strings for equality case-insensitively. In both variants there are characters that mess this up.

The correct way to compare strings case-insensitively is to use one of the insensitive StringComparison options (you know that).

The right way to use a data structure case-insensitively is to use one of StringComparer.*IgnoreCase. For example:

new HashSet<string>(StringComparer.InvariantCultureIgnoreCase)

Do not uppercase strings before adding them to a data structure. I would fail that in any code review.

If you need the lowercase or uppercase version of an operating system identifier

You do not need such as thing. This statement does not apply to your case.

Recommended topics

Hot tags