Why comparing two equal persian word does not return 0?

Asked 20/2, 2013 at 15:30 Answered 8/6, 2020 at 9:57

Solved c#.net vb.net cultureinfo culture

We have two same letter 'ی' and 'ي' which the first came as main letter after windows seven.
Back to old XP we had the second one as main.
Now the inputs I get is determined as different if one client is on windows XP and the other on windows seven.
I have also tried to use Persian culture with no success.
Am I missing anything ?
EDIT : Had to change the words for better understanding.. now they look similar.

foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>()) 
    Console.WriteLine( string.Compare("محسنين", "محسنین", new CultureInfo("fa-ir"), i) + "\t : " + i );

Outputs :

-1       : None
-1       : IgnoreCase
-1       : IgnoreNonSpace
-1       : IgnoreSymbols
-1       : IgnoreKanaType
-1       : IgnoreWidth
1        : OrdinalIgnoreCase
-1       : StringSort
130      : Ordinal

Thermion answered 20/2, 2013 at 15:30 Comment(8)

Well that doesn't seem fair at all. – Ithaca 20/2, 2013 at 15:32

I am no persian and don't actually understand the language, but: ي does not look like ی to me! – Alain 20/2, 2013 at 15:33

@Aniket Just like a does not look like A.. but both are equal – Thermion 20/2, 2013 at 15:35

I don't know about persian, but in arabic they are totally different. – Aerify 20/2, 2013 at 15:35

@Aniket: You're sharp-eyed. – Udale 20/2, 2013 at 15:36

@AbZy I know in Arabic those aren't equal. Even if those aren't equal Microsoft windows shouldn't have changed the main letter in their windows seven. My problem is that inputs are determined as different on different windows platforms. – Thermion 20/2, 2013 at 15:40

I certainly missing something, they look different to me. If I were doing something similar in the english alphabet I'd expect to have to replace one with the other before the compare. – Hypaesthesia 20/2, 2013 at 15:40

Btw, it's always the last char that is not equal and not the first. – Udale 20/2, 2013 at 15:52

The two strings are not equal. The last letter differs.

About why IgnoreCase returns -1 but OrdinalIgnoreCase returns 1:

OrdinalIgnoreCase uses the invariant culture to convert the string to upper and afterwards performs a byte by byte comparison
IgnoreCase uses the specified culture to perform a case insensitive compare.

The difference is that IgnoreCase knows "more" about the differences in the letters of the specified language and will treat them possibly differently than the invariant culture, leading to a different outcome.
This is a different manifestation of what became known as "The Turkish İ Problem".

You can verify it yourself by using the InvariantCulture instead of the Persian one:

foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>()) 
    Console.WriteLine( string.Compare("محسنی", "محسني", CultureInfo.InvariantCulture, i) + "\t : " + i );

This will output 1 for both IgnoreCase and OrdinalIgnoreCase.

Regarding your edited question:
The two strings still differ. The following code outputs the values of the single characters in the strings.

foreach(var value in strings.Select(x => x.ToCharArray().Select(y => (int)y)))
    Console.WriteLine(value);

The result will look like this:

1605
1581
1587
1606
1610 // <-- "yeh": ي
1606

1605
1581
1587
1606
1740 // <-- "farsi yeh": ی
1606

As you can see, there is one character that differs, resulting in a comparison that treats those two strings as not equal.

Wisner answered 20/2, 2013 at 15:41 Comment(24)

@Mahdi: They are still different. And this difference most likely is the actual reason for the results you have been experiencing all along. The first string contains the characters with the following values: 1605, 1581, 1587, 1606, 1610, 1606. The second string contains these values: 1605, 1581, 1587, 1606, 1740, 1606. As you can see, one byte differs. – Wisner 20/2, 2013 at 16:4

@DanielHilgarth I just don't care about the byte differs.. All I know those both words are same in Persian. If you are right then Microsoft is wrong on changing in different windows platforms. It's simple. You or .NET comparison or Microsoft strategy is wrong. – Thermion 20/2, 2013 at 16:9

@Mahdi: That's a pretty rude comment. I am trying to explain why you are getting the results you are getting. I didn't write the code that does this comparison, nor am I responsible for the strings you are trying to compare. – Wisner 20/2, 2013 at 16:11

Collators and Cultural comparisons do not care about binary equality.. you have == operator for that. – Ashworth 20/2, 2013 at 16:12

@DanielHilgarth I'm sorry I didn't mean that hot. Once again sorry ;) – Thermion 20/2, 2013 at 16:12

@Mahdi: According to this unicode overview, the differing bytes are "yeh" and "farsi yeh". – Wisner 20/2, 2013 at 16:13

@DanielHilgarth I am trying to find out if you are right why my windows XP clients have the second yeh but my windows seven clients have the first yeh. this cause my database to fail for search queries. – Thermion 20/2, 2013 at 16:16

@Mahdi: Windows XP is very very old. It is quite possible that its unicode implementation or its support for persian had an error and was using the incorrect "farsi yeh" instead of the correct "yeh". Windows 7 could have fixed that error. – Wisner 20/2, 2013 at 16:18

@Mahdi: According to this comment "yeh" is not a persian letter at all, only "farsi yeh" is. So maybe that's what has been changed between the two Windows versions. – Wisner 20/2, 2013 at 16:22

@DanielHilgarth Well then I have to manually fix windows XP bug by replacing the chars. However thanks for providing these information. I'll mark as answer if no one else provide a better answer. thanks – Thermion 20/2, 2013 at 16:24

@Mahdi: More on this issue: It shows that the two letters are really different. When seen in isolation, Yeh looks like this: ي and Farsi Yeh looks like this: ی. Note the part below the actual letter. – Wisner 20/2, 2013 at 16:25

@Mahdi: I am not sure replacing characters is a good idea. Where are you getting those strings from? I mean, I know they come from your database, but how do they actually get there? – Wisner 20/2, 2013 at 16:27

@DanielHilgarth I get inputs from my web application in asp.net by a textbox and I do a search for entered string. Now the results will be seperated by windows platforms. If it's XP then only contents that is written in windows XP will be shown to client. The same goes for windows seven. I like to know if you have any better idea ? – Thermion 20/2, 2013 at 16:29

@Mahdi: And the exact same keypresses on XP and Windows 7 produce those different results? Have you verified this yourself? – Wisner 20/2, 2013 at 16:33

@DanielHilgarth Yes exactly the same which will be D in English layout. – Thermion 20/2, 2013 at 16:35

@Mahdi: Yes, that really looks like some kind of fix between the windows versions... In that case I don't really have a better idea, sorry. – Wisner 20/2, 2013 at 16:37

@DanielHilgarth Thanks seems like a windows XP bug. Will do char replacing to avoid it. – Thermion 20/2, 2013 at 16:39

@Mahdi: You might want to read this. It explains the differences between Yeh and Farsi Yeh. Yeh is actually not a Persian letter, only Farsi Yeh is. Have you made sure that both computers were using Persian as keyboard layout? Maybe one was using Arabic instead? – Wisner 20/2, 2013 at 17:29

@Mahdi: According to this website, there was a problem with Windows 2000 which has been fixed with a Service Pack, so it is unlikely that the problem would still exist in Windows XP... – Wisner 20/2, 2013 at 17:30

@DanielHilgarth I'm a Persian programmer. And I can verify that the key press D in all XP windows with Persian keyboard layout writes the second yeh; even in XP SP3. And I can confirm the same key press D after windows seven has changed to first yeh. However I'm not sure about windows vista. – Thermion 20/2, 2013 at 17:33

@Mahdi: And you are 100% sure (triple check) you are using the same keyboard layout on all machines? I am asking, because the "first yeh" seems to be the non-Persian one (with the dots) and would be wrong. It would be strange for such a bug to be re-introduced into Windows 7. – Wisner 20/2, 2013 at 17:36

@DanielHilgarth Yes. This was not my today problem. This was a usual problem for all of Persians back to old XP days.. and this echange of chars in seven caused a lot's of problems.. If you are interested on this check following Persian forums for more information : forum.iranphp.org/… barnamenevis.org/… – Thermion 20/2, 2013 at 17:43

@DanielHilgarth Also it's not only yeh but ka too :) – Thermion 20/2, 2013 at 17:43

@Mahdi: You might want to take this issue to an official MS forum. BTW: I don't understand Persian at all, so I can't read your links. – Wisner 20/2, 2013 at 17:53

Here My Code Characters Arabian “ي,ك” to Persian “ی,ک” ,By extension method:

 private static readonly string[] pn = { "ی", "ک" };
    private static readonly string[] ar = { "ي", "ك" };
    public static string ToFaText(this string strTxt)
    {
        string chash = strTxt;
        for (int i = 0; i < 2; i++)
            chash = chash.Replace(ar[i],pn[i]);
        return chash;
    }

Labored answered 24/10, 2017 at 7:39 Comment(0)

public string ToFaText(string strTxt)
{
    return strTxt.Replace("ك","ک").Replace("ي","ی");
}

usage:

        string str="اولين برداشت";
        string per = ToFaText(str);

Flagelliform answered 8/6, 2020 at 9:57 Comment(3)

This could do with some further explanation, code-only answers aren't ideal – Craunch 8/6, 2020 at 12:49

sometimes simple way is the best way! – Flagelliform 8/6, 2020 at 19:56

@Persistence isn’t suggesting that your answer should be more complicated, but that it’d be easier for readers to understand why this approach might be preferable to other approaches if you offered some additional explanation. That’s especially useful here since there’s an accepted answer from seven years ago. Why, for instance, do you prefer not including CultureInfo.InvariantCulture (or, in your case, StringComparison.Ordinal)? – Terebinthine 9/6, 2020 at 1:3

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags