String.Contains and String.LastIndexOf C# return different result?
Asked Answered
B

4

5

I have this problem where String.Contains returns true and String.LastIndexOf returns -1. Could someone explain to me what happened? I am using .NET 4.5.

    static void Main(string[] args)
    {
        String wikiPageUrl = @"http://it.wikipedia.org/wiki/ʿAbd_Allāh_al-Sallāl";

        if (wikiPageUrl.Contains("wikipedia.org/wiki/"))
        {

            int i = wikiPageUrl.LastIndexOf("wikipedia.org/wiki/");

            Console.WriteLine(i);

        }
    }
Basion answered 2/9, 2014 at 0:36 Comment(7)
If that's a direct C&P then it doesn't compile. If it isn't a direct C&P: why not?Phycomycete
It returns 10 for me, as expected (demo). Voting to close as non-reproducible.Tlingit
The code doesn't compile. I think quotes you are using are Unicode characters.Locative
You have a typo: int i = wikiPageUrl.LastIndexOf (( “wikipedia.org/wiki/” );Universalism
@Dima, what about the quotes and missing semi-colon?Locative
The LastIndexOf(String) method always returns String.Length – 1, which represents the last index position in the current instance.Carbarn
@HassanNisar they're missing as well evidently :)Universalism
F
3

Try using StringComparison.Ordinal

This will compare the string by evaluating the numeric values of the corresponding chars in each string, this should work with the special chars you have in that example string

 string wikiPageUrl = @"http://it.wikipedia.org/wiki/ʿAbd_Allāh_al-Sallāl";
 int i = wikiPageUrl.LastIndexOf("http://it.wikipedia.org/wiki/", StringComparison.Ordinal);

// returns 0;

Furlani answered 2/9, 2014 at 1:10 Comment(0)
B
5

While @sa_ddam213's answer definitely fixes the problem, it might help to understand exactly what's going on with this particular string.

If you try the example with other "special characters," the problem isn't exhibited. For example, the following strings work as expected:

string url1 = @"http://it.wikipedia.org/wiki/»Abd_Allāh_al-Sallāl";
Console.WriteLine(url1.LastIndexOf("it.wikipedia.org/wiki/")); // 7

string url2 = @"http://it.wikipedia.org/wiki/~Abd_Allāh_al-Sallāl";
Console.WriteLine(url2.LastIndexOf("it.wikipedia.org/wiki/")); // 7

The character in question, "ʿ", is called a spacing modifier letter1. A spacing modifier letter doesn't stand on its own, but modifies the previous character in the string, this case a "/". Another way to put this is that it doesn't take up its own space when rendered.

LastIndexOf, when called with no StringComparison argument, compares strings using the current culture.

When strings are compared in a culture-sensitive manner, the "/" and "ʿ" characters are not seen as two distinct characters--they're processed into one character, which does not match the parameter passed in to LastIndexOf.

When you pass in StringComparison.Ordinal to LastIndexOf, the characters are treated as distinct, due to the nature of Ordinal comparison.

Another way to make this work would be to use CompareInfo.LastIndexOf and supply the CompareOptions.IgnoreNonSpace option:

Console.WriteLine(
    CultureInfo.CurrentCulture.CompareInfo.LastIndexOf(
        wikiPageUrl, @"it.wikipedia.org/wiki/", CompareOptions.IgnoreNonSpace));
// 7

Here we're saying that we don't want combining characters included in our string comparison.

As a sidenote, this means that @Partha's answer and @Noctis' answer only work because the character is being applied to a character that doesn't appear in the search string that's passed to LastIndexOf.

Contrast this with the Contains method, which by default performs an Ordinal (case sensitive and culture insensitive) comparison. This explains why Contains returns true and LastIndexOf returns false.

For a fantastic overview of how strings should be manipulated in the .NET framework, check out this article.


1: Is this different than a combining character or is it a type of combining character? would appreciate if someone would clear that up for me.

Blanchard answered 2/9, 2014 at 2:45 Comment(0)
F
3

Try using StringComparison.Ordinal

This will compare the string by evaluating the numeric values of the corresponding chars in each string, this should work with the special chars you have in that example string

 string wikiPageUrl = @"http://it.wikipedia.org/wiki/ʿAbd_Allāh_al-Sallāl";
 int i = wikiPageUrl.LastIndexOf("http://it.wikipedia.org/wiki/", StringComparison.Ordinal);

// returns 0;

Furlani answered 2/9, 2014 at 1:10 Comment(0)
G
2

The thing is C# lastindexof looks from behind.

And wikipedia.org/wiki/ is followed by ' which it takes as escape sequence. So either remove ' after wiki/ or have an @ there too.

The following syntax will work( anyone )

string wikiPageUrl = @"http://it.wikipedia.org/wiki/Abd_Allāh_al-Sallāl";

string wikiPageUrl = @"http://it.wikipedia.org/wiki/@ʿAbd_Allāh_al-Sallāl";

int i = wikiPageUrl.LastIndexOf("wikipedia.org/wiki");

All 3 works

If you want a generalized solution for this problem replace ' with @' in your string before you perform any operations.

Gawk answered 2/9, 2014 at 1:4 Comment(0)
B
0

the ' characters throws it off.

This should work, when you escape the ' as \':

wikiPageUrl = @"http://it.wikipedia.org/wiki/\'Abd_Allāh_al-Sallāl";

if (wikiPageUrl.Contains("wikipedia.org/wiki/"))
{
    "contains".Dump();
  int i = wikiPageUrl.LastIndexOf("wikipedia.org/wiki/");

  Console.WriteLine(i);

}

figure out what you want to do (remove the ', escape it, or dig deeper :) ).

Bookseller answered 2/9, 2014 at 0:59 Comment(1)
I am running the job on entire Wikipedia and the column wikiPageUrl came from the database which I do not want to preprocess. I have some work around already but I want to know the reason why Contains return "true" and LastIndexOf returns "-1"Basion

© 2022 - 2024 — McMap. All rights reserved.