Why comparing two equal persian word does not return 0?
Asked Answered
T

3

5

We have two same letter 'ی' and 'ي' which the first came as main letter after windows seven.
Back to old XP we had the second one as main.
Now the inputs I get is determined as different if one client is on windows XP and the other on windows seven.
I have also tried to use Persian culture with no success.
Am I missing anything ?
EDIT : Had to change the words for better understanding.. now they look similar.

foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>()) 
    Console.WriteLine( string.Compare("محسنين", "محسنین", new CultureInfo("fa-ir"), i) + "\t : " + i );

Outputs :

-1       : None
-1       : IgnoreCase
-1       : IgnoreNonSpace
-1       : IgnoreSymbols
-1       : IgnoreKanaType
-1       : IgnoreWidth
1        : OrdinalIgnoreCase
-1       : StringSort
130      : Ordinal
Thermion answered 20/2, 2013 at 15:30 Comment(8)
Well that doesn't seem fair at all.Ithaca
I am no persian and don't actually understand the language, but: ي does not look like ی to me!Alain
@Aniket Just like a does not look like A.. but both are equalThermion
I don't know about persian, but in arabic they are totally different.Aerify
@Aniket: You're sharp-eyed.Udale
@AbZy I know in Arabic those aren't equal. Even if those aren't equal Microsoft windows shouldn't have changed the main letter in their windows seven. My problem is that inputs are determined as different on different windows platforms.Thermion
I certainly missing something, they look different to me. If I were doing something similar in the english alphabet I'd expect to have to replace one with the other before the compare.Hypaesthesia
Btw, it's always the last char that is not equal and not the first.Udale
W
7

The two strings are not equal. The last letter differs.

About why IgnoreCase returns -1 but OrdinalIgnoreCase returns 1:

  • OrdinalIgnoreCase uses the invariant culture to convert the string to upper and afterwards performs a byte by byte comparison
  • IgnoreCase uses the specified culture to perform a case insensitive compare.

The difference is that IgnoreCase knows "more" about the differences in the letters of the specified language and will treat them possibly differently than the invariant culture, leading to a different outcome.
This is a different manifestation of what became known as "The Turkish İ Problem".

You can verify it yourself by using the InvariantCulture instead of the Persian one:

foreach (CompareOptions i in Enum.GetValues(new CompareOptions().GetType()).OfType<CompareOptions>()) 
    Console.WriteLine( string.Compare("محسنی", "محسني", CultureInfo.InvariantCulture, i) + "\t : " + i );

This will output 1 for both IgnoreCase and OrdinalIgnoreCase.

Regarding your edited question:
The two strings still differ. The following code outputs the values of the single characters in the strings.

foreach(var value in strings.Select(x => x.ToCharArray().Select(y => (int)y)))
    Console.WriteLine(value);

The result will look like this:

1605
1581
1587
1606
1610 // <-- "yeh": ي
1606

1605
1581
1587
1606
1740 // <-- "farsi yeh": ی
1606

As you can see, there is one character that differs, resulting in a comparison that treats those two strings as not equal.

Wisner answered 20/2, 2013 at 15:41 Comment(24)
@Mahdi: They are still different. And this difference most likely is the actual reason for the results you have been experiencing all along. The first string contains the characters with the following values: 1605, 1581, 1587, 1606, 1610, 1606. The second string contains these values: 1605, 1581, 1587, 1606, 1740, 1606. As you can see, one byte differs.Wisner
@DanielHilgarth I just don't care about the byte differs.. All I know those both words are same in Persian. If you are right then Microsoft is wrong on changing in different windows platforms. It's simple. You or .NET comparison or Microsoft strategy is wrong.Thermion
@Mahdi: That's a pretty rude comment. I am trying to explain why you are getting the results you are getting. I didn't write the code that does this comparison, nor am I responsible for the strings you are trying to compare.Wisner
Collators and Cultural comparisons do not care about binary equality.. you have == operator for that.Ashworth
@DanielHilgarth I'm sorry I didn't mean that hot. Once again sorry ;)Thermion
@Mahdi: According to this unicode overview, the differing bytes are "yeh" and "farsi yeh".Wisner
@DanielHilgarth I am trying to find out if you are right why my windows XP clients have the second yeh but my windows seven clients have the first yeh. this cause my database to fail for search queries.Thermion
@Mahdi: Windows XP is very very old. It is quite possible that its unicode implementation or its support for persian had an error and was using the incorrect "farsi yeh" instead of the correct "yeh". Windows 7 could have fixed that error.Wisner
@Mahdi: According to this comment "yeh" is not a persian letter at all, only "farsi yeh" is. So maybe that's what has been changed between the two Windows versions.Wisner
@DanielHilgarth Well then I have to manually fix windows XP bug by replacing the chars. However thanks for providing these information. I'll mark as answer if no one else provide a better answer. thanksThermion
@Mahdi: More on this issue: It shows that the two letters are really different. When seen in isolation, Yeh looks like this: ي and Farsi Yeh looks like this: ی. Note the part below the actual letter.Wisner
@Mahdi: I am not sure replacing characters is a good idea. Where are you getting those strings from? I mean, I know they come from your database, but how do they actually get there?Wisner
@DanielHilgarth I get inputs from my web application in asp.net by a textbox and I do a search for entered string. Now the results will be seperated by windows platforms. If it's XP then only contents that is written in windows XP will be shown to client. The same goes for windows seven. I like to know if you have any better idea ?Thermion
@Mahdi: And the exact same keypresses on XP and Windows 7 produce those different results? Have you verified this yourself?Wisner
@DanielHilgarth Yes exactly the same which will be D in English layout.Thermion
@Mahdi: Yes, that really looks like some kind of fix between the windows versions... In that case I don't really have a better idea, sorry.Wisner
@DanielHilgarth Thanks seems like a windows XP bug. Will do char replacing to avoid it.Thermion
@Mahdi: You might want to read this. It explains the differences between Yeh and Farsi Yeh. Yeh is actually not a Persian letter, only Farsi Yeh is. Have you made sure that both computers were using Persian as keyboard layout? Maybe one was using Arabic instead?Wisner
@Mahdi: According to this website, there was a problem with Windows 2000 which has been fixed with a Service Pack, so it is unlikely that the problem would still exist in Windows XP...Wisner
@DanielHilgarth I'm a Persian programmer. And I can verify that the key press D in all XP windows with Persian keyboard layout writes the second yeh; even in XP SP3. And I can confirm the same key press D after windows seven has changed to first yeh. However I'm not sure about windows vista.Thermion
@Mahdi: And you are 100% sure (triple check) you are using the same keyboard layout on all machines? I am asking, because the "first yeh" seems to be the non-Persian one (with the dots) and would be wrong. It would be strange for such a bug to be re-introduced into Windows 7.Wisner
@DanielHilgarth Yes. This was not my today problem. This was a usual problem for all of Persians back to old XP days.. and this echange of chars in seven caused a lot's of problems.. If you are interested on this check following Persian forums for more information : forum.iranphp.org/… barnamenevis.org/…Thermion
@DanielHilgarth Also it's not only yeh but ka too :)Thermion
@Mahdi: You might want to take this issue to an official MS forum. BTW: I don't understand Persian at all, so I can't read your links.Wisner
L
0

Here My Code Characters Arabian “ي,ك” to Persian “ی,ک” ,By extension method:

 private static readonly string[] pn = { "ی", "ک" };
    private static readonly string[] ar = { "ي", "ك" };
    public static string ToFaText(this string strTxt)
    {
        string chash = strTxt;
        for (int i = 0; i < 2; i++)
            chash = chash.Replace(ar[i],pn[i]);
        return chash;
    }
Labored answered 24/10, 2017 at 7:39 Comment(0)
F
0
public string ToFaText(string strTxt)
{
    return strTxt.Replace("ك","ک").Replace("ي","ی");
}

usage:

        string str="اولين برداشت";
        string per = ToFaText(str);
Flagelliform answered 8/6, 2020 at 9:57 Comment(3)
This could do with some further explanation, code-only answers aren't idealCraunch
sometimes simple way is the best way!Flagelliform
@Persistence isn’t suggesting that your answer should be more complicated, but that it’d be easier for readers to understand why this approach might be preferable to other approaches if you offered some additional explanation. That’s especially useful here since there’s an accepted answer from seven years ago. Why, for instance, do you prefer not including CultureInfo.InvariantCulture (or, in your case, StringComparison.Ordinal)?Terebinthine

© 2022 - 2024 — McMap. All rights reserved.