Culture-Invariant case-sensitive string comparison returns different results on different machines
Asked Answered
R

1

13

I've found that the test results are different on my machine and the build server. I've managed to find the single line that differs. This is a string comparison. The two strings differ in case of the first character.

The test below passes on my local machine and fails on the build machine.

[TestClass]
public class Tests 
{
    [TestMethod]
    public void Strings()
    {
        Assert.IsFalse(0 == string.Compare("Term’s", "term’s", false, CultureInfo.InvariantCulture));
    }
}

I've also tried to change it to string.Equals:

string.Equals("Term’s", "term’s", StringComparison.InvariantCulture);

string.Equals returns true on the build server and returns false on my local machine.

Ordinal comparison gives same results on both machines:

string.Compare("Term’s", "term’s", StringComparison.Ordinal))

As I understand, InvariantCulture is supposed to return the same results everywhere. How can a case-sensitive culture-invariant string comparison depend on a machine? What settings should I check to identify the problem?

Update: platform and string

The string is important. These results can be observed for strings with "exotic" punctuation like RIGHT SINGLE QUOTATION MARK or RIGHT DOUBLE QUOTATION MARK

It seems the behavior reproduces on Windows 8 machines. You can see it even on https://dotnetfiddle.net/ if you type the following:

using System;
using System.Globalization;

public class Program
{
    public static void Main()
    {
        Console.WriteLine(0 == string.Compare("Terms", "terms", false, CultureInfo.InvariantCulture));
        Console.WriteLine(0 == string.Compare("Term’s", "term’s", false, CultureInfo.InvariantCulture));
        Console.WriteLine(0 == string.Compare("Term“s", "term“s", false, CultureInfo.InvariantCulture));
        Console.WriteLine(0 == string.Compare("Term”s", "term”s", false, CultureInfo.InvariantCulture));

        //outputs
        //False
        //True
        //True
        //True
    }
}

Environment.OSVersion (server's): Microsoft Windows NT 6.2.9200.0
Environment.Is64BitOperatingSystem (server's): True
Environment.Version (server's) 4.0.30319.18449

Environment.OSVersion (local): Microsoft Windows NT 6.1.7601 Service Pack 1
Environment.Is64BitOperatingSystem (local): True
Environment.Version (local): 4.0.30319.18444

Update: related MSDN forums link

It may be a known bug in Windows 8, which is fixed in Windows 8.1.

http://social.msdn.microsoft.com/Forums/vstudio/en-US/4a1ab6b7-6dcc-46bf-8650-e0d9ebbf1735/stringcompare-not-always-casesensitive-on-windows-8?forum=netfxbcl

Rodrigo answered 8/9, 2014 at 15:5 Comment(17)
What platform (hardware, OS, CLI) do your PC and build server run on? InvariantCulture is supposed to be case-sensitive, so it sounds like a platform bug.Coney
I don't suppose your build machine has some strange processing of the programs, like a C# preprocessor that mangles things, or some kind of obfuscation that mangles things, or some kind of aspect-oriented process injection (PostSharp comes to mind) that mangles things?Sportsman
@adriano-repetti I double-checked it now, I've copied the test with no changes from the code. And the test failed when I checked it in. As for encoding, I'm not sure it is not changed somewhere in Local->Source Control->Build server, but (int)str[i] return the same numbers for each character for my local machine and server. So at least it compiles to the same thing.Rodrigo
@Coney I've added output of Environment.OSVersion and Environment.Version. Unfortunately, I have no other access to the server that checking in the code and observing the test results. I will raise a ticket to get the hardware. What exactly should I ask about hardware?Rodrigo
@Sportsman No, we don't use such processing.Rodrigo
string.Equals("Term’s", "term’s", StringComparison.OrdinalIgnoreCase);Foofaraw
Just curious - when you request current culture on the two systems, what does it say?Sportsman
@Sportsman en-US for both.Rodrigo
"Murphy’s law" == "murphy’s law" ? :-)Sportsman
Out of curiosity, do you see a difference (between those two machines) if you use instead StringComparer.InvariantCulture.Compare("Term’s", "term’s")?Grogram
I tried both Win 7 and Win 8.1 and I can't reproduce. Is it limited to Win 8? D**n it's even a single UTF-16 code point (and it's there from hmmm around Unicode 3?). Does it do same also for String.Equals()?Unprovided
@AdrianoRepetti I wasn't able to reproduce it on Windows 7 and Windows 8.1. I was able to reproduce it on the build server (I don't know its exact OS version yet) and on a desktop Windows 8 machine. It is the same for equivalent string.Equals call. I've also found it is reproduced on dotnetfiddle.net snippets site. You can try it yourself.Rodrigo
@AdrianoRepetti It's not limited (on dotnetfiddle) to that specific character, either. For example, "WHAT THE HELL???! ë" and "What the hell???! ë" also compare equal.Sob
@JeppeStigNielsen I will try it on the exact machines tomorrow. But it gives 0 on dotnetfiddle.net site, which gave me the same results as my build server for the lines I specified in my question.Rodrigo
@hvd yes I'm checking that. It produces wrong results with any character > 127. Well they just have to rename it as InvariantUsAsciiCulture and it works as expected. If it's as Eric said and everything is delegated to OS (and then this is an OS related bug) I would see if it's documented. A lot of code may be broken out there!!!Unprovided
@AdrianoRepetti, hvd It may be a known Windows 8 bug, which is fixed in Windows 8.1 if I understand this MSDN forums link correct.Rodrigo
@Rodrigo there isn't an official confirmation there but I think you're right. Wow...it's a SERIOUS thing.Unprovided
R
7

InvariantCulture is unfortunately still a linguistic comparison and as such it can vary (and does vary, especially when new characters are added to Unicode) between versions of the OS. Versions of .Net prior to 4.0 carried their own payload of data and thus would not vary but since then they pick up the data from the OS and will potentially vary. Ordinal is the only comparison that will not change and is what you really need to do if you desire stability.

That said, you should not be seeing differences in behavior for the code that you supply. The differences you observe are due to a bug with Windows 8 that has been fixed in Windows 8.1.

Rodgers answered 8/9, 2014 at 17:35 Comment(5)
Do you mean I should use Ordinal comparison even though I compare the strings as english words and not identifiers?Rodrigo
Er... What's the point of an InvariantCulture that varies between the various OS versions? To quote from CultureInfo.InvariantCulture: "Unlike culture-sensitive data, which is subject to change by user customization or by updates to the .NET Framework or the operating system, invariant culture data is stable over time and across installed cultures and cannot be customized by users."Sob
@hvd that is true of the rest of the locale data (things that affect formatting). Unfortunately it is not true for collation (sorting).Rodgers
So much for invariant then :(Remiss
Please explain with more details! In this case only "non ASCII" character is 2019 "RIGHT SINGLE QUOTATION MARK" and it's there from a long long time. I understand security implications (as described in MSDN) but in a normal environment (without hackers trying to break our code) HOW this applies? It can't be such broken (and in case of bugs I assume they're fixed or documented) otherwise the whole point of an invariant culture is absolutely meaningless.Unprovided

© 2022 - 2024 — McMap. All rights reserved.