indexOf Case Sensitive?
Asked Answered
C

19

93

Is the indexOf(String) method case sensitive? If so, is there a case insensitive version of it?

Chinoiserie answered 14/7, 2009 at 15:35 Comment(1)
Not that I'm a big performance guy or anything (I actually consider performance tuning kind of evil), but the .toUpperCase copies your string each time you call it so if you do this in a loop, try to move the .toUpperCase out of the loop if possible.Parulis
D
80

The indexOf() methods are all case-sensitive. You can make them (roughly, in a broken way, but working for plenty of cases) case-insensitive by converting your strings to upper/lower case beforehand:

s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);
Donall answered 14/7, 2009 at 15:39 Comment(7)
Beware of internationalization issues (i.e. the Turkish İ) when using toUpperCase. A more proper solution is to use str.toUpperCase(Locale.US).indexOf(...);Bantustan
I'm quite sure that case-converting and then comparing is not entirely correct according to Unicode comparison rules. It works for some things (namely case folding, which is generally used only in syntax parsing contexts) but for natural language there can be special cases where two strings that should compare equal don't, under either both uppercase or both lowercase. I can't come up with any examples off the bat however.Postmeridian
Won't work. Some weird, international characters are converted to multiple characters when converted to lower-/upper-case. For example: "ß".toUpperCase().equals("SS")Smaltite
ß is hardly a weird character and it's hardly international either, being used only in Germany and Austria. But yes, this is just as good as it gets but not actually a case-insensitive comparison, as nielsm already pointed out three years ago.Donall
Does not work for Turkish unicode, that comes straight from somebody's email.Apetalous
Due to the discrepancy between some characters' upper and lower case equivalents, one could perform both tests - one upper and one lower case match - and if either pass, the match is successful.Littell
This is a very obvious method and not an answer the question, which is "Is there a case insensitive version of it [indexOf)?Alys
L
44

Is the indexOf(String) method case sensitive?

Yes, it is case sensitive:

@Test
public void indexOfIsCaseSensitive() {
    assertTrue("Hello World!".indexOf("Hello") != -1);
    assertTrue("Hello World!".indexOf("hello") == -1);
}

If so, is there a case insensitive version of it?

No, there isn't. You can convert both strings to lower case before calling indexOf:

@Test
public void caseInsensitiveIndexOf() {
    assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
    assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}
Lynnettelynnworth answered 14/7, 2009 at 15:38 Comment(2)
oh please please please don't forget to use culture invariant conversion with Locale.US, we had enough problems with java applications running under Turkish locale.Heartsease
@Heartsease - forcing to US locale doesn't solve the problem, because it still doesn't work for strings that actually contain the characters that are problematic to start with (for instance "ı".toLowerCase(Locale.US).indexOf("I".toLowerCase(Locale.US)) should return 0 because the first string is a Turkish lower case "I", and therefore should compare as equal to the upper-case "I" in the second, but returns -1 because the latter is converted to "i" instead).Gower
P
23

There is an ignore case method in StringUtils class of Apache Commons Lang library

indexOfIgnoreCase(CharSequence str, CharSequence searchStr)

Phonography answered 29/1, 2014 at 9:14 Comment(1)
This should be an accepted answer, as the current one does not work for certain non-ascii strings that contain unicode control characters. For example, this works for text written in Turkish. Behind the scene Apache uses regionMatches, and that does work.Apetalous
I
17

Yes, indexOf is case sensitive.

The best way to do case insensivity I have found is:

String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());

That will do a case insensitive indexOf().

Indeciduous answered 14/7, 2009 at 15:38 Comment(1)
No. Don't ever do that. The reason is that, original.toLowerCase().length() not always equals to original.length(). The result idx is not able to map back correctly to original.Gurolinick
A
17

Here is my solution which does not allocate any heap memory, therefore it should be significantly faster than most of the other implementations mentioned here.

public static int indexOfIgnoreCase(final String haystack,
                                    final String needle) {
    if (needle.isEmpty() || haystack.isEmpty()) {
        // Fallback to legacy behavior.
        return haystack.indexOf(needle);
    }

    for (int i = 0; i < haystack.length(); ++i) {
        // Early out, if possible.
        if (i + needle.length() > haystack.length()) {
            return -1;
        }

        // Attempt to match substring starting at position i of haystack.
        int j = 0;
        int ii = i;
        while (ii < haystack.length() && j < needle.length()) {
            char c = Character.toLowerCase(haystack.charAt(ii));
            char c2 = Character.toLowerCase(needle.charAt(j));
            if (c != c2) {
                break;
            }
            j++;
            ii++;
        }
        // Walked all the way to the end of the needle, return the start
        // position that this was found.
        if (j == needle.length()) {
            return i;
        }
    }

    return -1;
}

And here are the unit tests that verify correct behavior.

@Test
public void testIndexOfIgnoreCase() {
    assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));

    assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));

    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
    assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
    assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));  
}
Alisha answered 22/4, 2015 at 21:51 Comment(5)
How does this answer the question??Vernitavernoleninsk
The answer is "no, there are no case insensitive versions of indexOf". However, I added the solution here because people are going to find this page looking for solutions. I made my solution available with test cases so that the next person coming through can use my code to solve the exact same problem. That's why stack overflow is useful right? I have a decade of experience writing high performance code, half of that at google. I just gave a well tested solution away for free to help the community.Alisha
This is exactly what I was interested in. I found this to be about 10-15% faster than the Apache Commons version. If I could upvote it many more times I would. Thanks!Kingfisher
Thanks Jeff, I'm glad it gave you a lot of value. There are others that are recommending that this post which provides a solution goes toward the top. If someone else likes my code then I humbly ask that you upvote this solution.Alisha
Here's a missing test case: assertThat(StringUtils.indexOfIgnoreCase("ı" /* Turkish lower-case I, U+0131 */, "I"), is(0));Gower
S
11

Yes, it is case-sensitive. You can do a case-insensitive indexOf by converting your String and the String parameter both to upper-case before searching.

String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());

Note that toUpperCase may not work in some circumstances. For instance this:

String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());

idxU will be 20, which is wrong! idxL will be 19, which is correct. What's causing the problem is tha toUpperCase() converts the "ß" character into TWO characters, "SS" and this throws the index off.

Consequently, always stick with toLowerCase()

Strongbox answered 14/7, 2009 at 15:41 Comment(1)
Sticking to lower case doesn't help: if you change find to "STRASSE", it doesn't find it at all in the lower case variant, but does correctly find it in the upper case version.Gower
K
4

What are you doing with the index value once returned?

If you are using it to manipulate your string, then could you not use a regular expression instead?

import static org.junit.Assert.assertEquals;    
import org.junit.Test;

public class StringIndexOfRegexpTest {

    @Test
    public void testNastyIndexOfBasedReplace() {
        final String source = "Hello World";
        final int index = source.toLowerCase().indexOf("hello".toLowerCase());
        final String target = "Hi".concat(source.substring(index
                + "hello".length(), source.length()));
        assertEquals("Hi World", target);
    }

    @Test
    public void testSimpleRegexpBasedReplace() {
        final String source = "Hello World";
        final String target = source.replaceFirst("(?i)hello", "Hi");
        assertEquals("Hi World", target);
    }
}
Kaduna answered 14/7, 2009 at 23:25 Comment(1)
Surprised by the lack of upvotes here. In a page dominated by incorrect answers, this is one of the only three that actually works correctly.Gower
I
2
@Test
public void testIndexofCaseSensitive() {
    TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}
Ilianailine answered 14/7, 2009 at 15:39 Comment(5)
This doesn't even answer the full question..it doesn't even say if the test passes....Indeciduous
You're right I didn't, I was kinda hoping that it would prompt the original questioner to run the test him/herself, and maybe get into the habitIlianailine
Well, that is fine...but I would argue that it would be better to vote for a question that actually gives an answer than a test. StackOverflow is trying to be a code Q and A repository. Thus full answers would be best.Indeciduous
@jjnguy: I was always under the impression that people who posted tests, posted tests that pass. @Lynnettelynnworth kind of did a similar thing. (But @dfa's answer is more complete).Commissar
But he also posted some words(description)...Those are usually helpful.Indeciduous
T
2

Yes, I am fairly sure it is. One method of working around that using the standard library would be:

int index = str.toUpperCase().indexOf("FOO"); 
Trillion answered 14/7, 2009 at 15:39 Comment(0)
M
2

I've just looked at the source. It compares chars so it is case sensitive.

Maleeny answered 14/7, 2009 at 15:41 Comment(0)
P
2

Had the same problem. I tried regular expression and the apache StringUtils.indexOfIgnoreCase-Method, but both were pretty slow... So I wrote an short method myself...:

public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
    if (chkstr != null && searchStr != null && i > -1) {
          int serchStrLength = searchStr.length();
          char[] searchCharLc = new char[serchStrLength];
          char[] searchCharUc = new char[serchStrLength];
          searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
          searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
          int j = 0;
          for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
                char charAt = chkstr.charAt(i);
                if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
                     if (++j == serchStrLength) {
                           return i - j + 1;
                     }
                } else { // faster than: else if (j != 0) {
                         i = i - j;
                         j = 0;
                    }
              }
        }
        return -1;
  }

According to my tests its much faster... (at least if your searchString is rather short). if you have any suggestions for improvement or bugs it would be nice to let me know... (since I use this code in an application ;-)

Padang answered 2/12, 2014 at 15:16 Comment(4)
This is actually very clever, as the searchstring will be significantly shorter than the text to search in, and it only creates an upper- and lowercase version of the searchstring. Thank you for that!Prehistory
This is significantly slower than StringUtils version in my testing. However, Zach's answer is like 10-15% faster.Kingfisher
This solution is about 10% faster than the one given by Zach Vorhies. Thank you for this solution.Perimeter
This solution doesn't produce a correct answer in presence of strings that change length on conversion to upper case (e.g. if you search for "ß" it will find it in any string that contains a single capital "S") or for text that uses alternative capitalizations (e.g. indexOfIgnoreCase("İ","i") should return 0 because İ is the correct capitalization of i for Turkish text, but instead returns -1 because i is capitalized to the more common I).Gower
B
1

Just to sum it up, 3 solutions:

  • using toLowerCase() or toUpperCase
  • using StringUtils of apache
  • using regex

Now, what I was wondering was which one is the fastest? I'm guessing on average the first one.

Bellda answered 9/2, 2014 at 20:28 Comment(0)
C
1

The first question has already been answered many times. Yes, the String.indexOf() methods are all case-sensitive.

If you need a locale-sensitive indexOf() you could use the Collator. Depending on the strength value you set you can get case insensitive comparison, and also treat accented letters as the same as the non-accented ones, etc. Here is an example of how to do this:

private int indexOf(String original, String search) {
    Collator collator = Collator.getInstance();
    collator.setStrength(Collator.PRIMARY);
    for (int i = 0; i <= original.length() - search.length(); i++) {
        if (collator.equals(search, original.substring(i, i + search.length()))) {
            return i;
        }
    }
    return -1;
}
Cyclone answered 14/1, 2015 at 1:31 Comment(1)
Surprised by the lack of upvotes here. In a page dominated by incorrect answers, this is one of the only three that actually works correctly.Gower
B
1

I would like to lay claim to the ONE and only solution posted so far that actually works. :-)

Three classes of problems that have to be dealt with.

  1. Non-transitive matching rules for lower and uppercase. The Turkish I problem has been mentioned frequently in other replies. According to comments in Android source for String.regionMatches, the Georgian comparison rules requires additional conversion to lower-case when comparing for case-insensitive equality.

  2. Cases where upper- and lower-case forms have a different number of letters. Pretty much all of the solutions posted so far fail, in these cases. Example: German STRASSE vs. Straße have case-insensitive equality, but have different lengths.

  3. Binding strengths of accented characters. Locale AND context effect whether accents match or not. In French, the uppercase form of 'é' is 'E', although there is a movement toward using uppercase accents . In Canadian French, the upper-case form of 'é' is 'É', without exception. Users in both countries would expect "e" to match "é" when searching. Whether accented and unaccented characters match is locale-specific. Now consider: does "E" equal "É"? Yes. It does. In French locales, anyway.

I am currently using android.icu.text.StringSearch to correctly implement previous implementations of case-insensitive indexOf operations.

Non-Android users can access the same functionality through the ICU4J package, using the com.ibm.icu.text.StringSearch class.

Be careful to reference classes in the correct icu package (android.icu.text or com.ibm.icu.text) as Android and the JRE both have classes with the same name in other namespaces (e.g. Collator).

    this.collator = (RuleBasedCollator)Collator.getInstance(locale);
    this.collator.setStrength(Collator.PRIMARY);

    ....

    StringSearch search = new StringSearch(
         pattern,
         new StringCharacterIterator(targetText),
         collator);
    int index = search.first();
    if (index != SearchString.DONE)
    {
        // remember that the match length may NOT equal the pattern length.
        length = search.getMatchLength();
        .... 
    }

Test Cases (Locale, pattern, target text, expectedResult):

    testMatch(Locale.US,"AbCde","aBcDe",true);
    testMatch(Locale.US,"éèê","EEE",true);

    testMatch(Locale.GERMAN,"STRASSE","Straße",true);
    testMatch(Locale.FRENCH,"éèê","EEE",true);
    testMatch(Locale.FRENCH,"EEE","éèê",true);
    testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);

    testMatch(new Locale("tr-TR"),"TITLE","tıtle",true);  // Turkish dotless I/i
    testMatch(new Locale("tr-TR"),"TİTLE","title",true);  // Turkish dotted I/i
    testMatch(new Locale("tr-TR"),"TITLE","title",false);  // Dotless-I != dotted i.

PS: As best as I can determine, the PRIMARY binding strength should do the right thing when locale-specific rules differentiate between accented and non-accented characters according to dictionary rules; but I don't which locale to use to test this premise. Donated test cases would be gratefully appreciated.

--

Copyright notice: because StackOverflow's CC-BY_SA copyrights as applied to code-fragments are unworkable for professional developers, these fragments are dual licensed under more appropriate licenses here: https://pastebin.com/1YhFWmnU

Bridegroom answered 27/2, 2020 at 21:53 Comment(3)
If you want to dual-license your code, please do so via some other platform, and include a link there. A massive blob of legalese appended to the end of each answer adds an egregious amount of clutter to Stack Overflow.Xerophthalmia
Then perhaps you should find a more efficient way to address the problem of CC-BY-SA applied to code fragments,Bridegroom
It also seems inappropriate for you to remove license grants that I provided to code fragments to which I hold copyright.Bridegroom
I
0

But it's not hard to write one:

public class CaseInsensitiveIndexOfTest extends TestCase {
    public void testOne() throws Exception {
        assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
    }

    public static int caseInsensitiveIndexOf(String substring, String string) {
        return string.toLowerCase().indexOf(substring.toLowerCase());
    }
}
Inheritable answered 14/7, 2009 at 15:42 Comment(1)
As commented above, this fails to correctly identify that "ı" is a lower-case variant (just not the default one in most langauges) of "I". Or alternatively, if run on a machine set to a locale where "ı" is the default, it will fail to notice that "i" is also a lower-case variant of "I".Gower
F
0

Converting both strings to lower-case is usually not a big deal but it would be slow if some of the strings is long. And if you do this in a loop then it would be really bad. For this reason, I would recommend indexOfIgnoreCase.

Freud answered 11/11, 2014 at 16:14 Comment(0)
F
0
 static string Search(string factMessage, string b)
        {

            int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
            string line = null;
            int i = index;
            if (i == -1)
            { return "not matched"; }
            else
            {
                while (factMessage[i] != ' ')
                {
                    line = line + factMessage[i];
                    i++;
                }

                return line;
            }

        }
Fiorin answered 26/11, 2018 at 13:36 Comment(1)
This looks like it might be C#Pirzada
J
0

Here's a version closely resembling Apache's StringUtils version:

public int indexOfIgnoreCase(String str, String searchStr) {
    return indexOfIgnoreCase(str, searchStr, 0);
}

public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
    // https://mcmap.net/q/127729/-string-contains-ignore-case-duplicate/14018511
    if(str == null || searchStr == null) return -1;
    if (searchStr.length() == 0) return fromIndex;  // empty string found; use same behavior as Apache StringUtils
    final int endLimit = str.length() - searchStr.length() + 1;
    for (int i = fromIndex; i < endLimit; i++) {
        if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
    }
    return -1;
}
Janus answered 12/8, 2019 at 5:9 Comment(0)
H
-2

indexOf is case sensitive. This is because it uses the equals method to compare the elements in the list. The same thing goes for contains and remove.

Hills answered 14/7, 2009 at 15:42 Comment(3)
The original question is about String's indexOf method.Maleeny
I didn't know that's what he was talking about. I didn't realize it until other people had said something. The principle is still the same though.Hills
No it isn't. The internals of String's indexOf method compares chars not objects, so it doesn't use the equals method.Maleeny

© 2022 - 2024 — McMap. All rights reserved.