Is the indexOf(String) method case sensitive? If so, is there a case insensitive version of it?
The indexOf()
methods are all case-sensitive. You can make them (roughly, in a broken way, but working for plenty of cases) case-insensitive by converting your strings to upper/lower case beforehand:
s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
s1.indexOf(s2);
"ß".toUpperCase().equals("SS")
–
Smaltite Is the indexOf(String) method case sensitive?
Yes, it is case sensitive:
@Test
public void indexOfIsCaseSensitive() {
assertTrue("Hello World!".indexOf("Hello") != -1);
assertTrue("Hello World!".indexOf("hello") == -1);
}
If so, is there a case insensitive version of it?
No, there isn't. You can convert both strings to lower case before calling indexOf:
@Test
public void caseInsensitiveIndexOf() {
assertTrue("Hello World!".toLowerCase().indexOf("Hello".toLowerCase()) != -1);
assertTrue("Hello World!".toLowerCase().indexOf("hello".toLowerCase()) != -1);
}
"ı".toLowerCase(Locale.US).indexOf("I".toLowerCase(Locale.US))
should return 0 because the first string is a Turkish lower case "I"
, and therefore should compare as equal to the upper-case "I"
in the second, but returns -1 because the latter is converted to "i"
instead). –
Gower There is an ignore case method in StringUtils class of Apache Commons Lang library
indexOfIgnoreCase(CharSequence str, CharSequence searchStr)
Yes, indexOf
is case sensitive.
The best way to do case insensivity I have found is:
String original;
int idx = original.toLowerCase().indexOf(someStr.toLowerCase());
That will do a case insensitive indexOf()
.
original.toLowerCase().length()
not always equals to original.length()
. The result idx
is not able to map back correctly to original
. –
Gurolinick Here is my solution which does not allocate any heap memory, therefore it should be significantly faster than most of the other implementations mentioned here.
public static int indexOfIgnoreCase(final String haystack,
final String needle) {
if (needle.isEmpty() || haystack.isEmpty()) {
// Fallback to legacy behavior.
return haystack.indexOf(needle);
}
for (int i = 0; i < haystack.length(); ++i) {
// Early out, if possible.
if (i + needle.length() > haystack.length()) {
return -1;
}
// Attempt to match substring starting at position i of haystack.
int j = 0;
int ii = i;
while (ii < haystack.length() && j < needle.length()) {
char c = Character.toLowerCase(haystack.charAt(ii));
char c2 = Character.toLowerCase(needle.charAt(j));
if (c != c2) {
break;
}
j++;
ii++;
}
// Walked all the way to the end of the needle, return the start
// position that this was found.
if (j == needle.length()) {
return i;
}
}
return -1;
}
And here are the unit tests that verify correct behavior.
@Test
public void testIndexOfIgnoreCase() {
assertThat(StringUtils.indexOfIgnoreCase("A", "A"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "A"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("A", "a"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "a"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("a", "ba"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("ba", "a"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", " Royal Blue"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase(" Royal Blue", "Royal Blue"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "royal"), is(0));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "oyal"), is(1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "al"), is(3));
assertThat(StringUtils.indexOfIgnoreCase("", "royal"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", ""), is(0));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BLUE"), is(6));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "BIGLONGSTRING"), is(-1));
assertThat(StringUtils.indexOfIgnoreCase("Royal Blue", "Royal Blue LONGSTRING"), is(-1));
}
assertThat(StringUtils.indexOfIgnoreCase("ı" /* Turkish lower-case I, U+0131 */, "I"), is(0));
–
Gower Yes, it is case-sensitive. You can do a case-insensitive indexOf
by converting your String and the String parameter both to upper-case before searching.
String str = "Hello world";
String search = "hello";
str.toUpperCase().indexOf(search.toUpperCase());
Note that toUpperCase may not work in some circumstances. For instance this:
String str = "Feldbergstraße 23, Mainz";
String find = "mainz";
int idxU = str.toUpperCase().indexOf (find.toUpperCase ());
int idxL = str.toLowerCase().indexOf (find.toLowerCase ());
idxU will be 20, which is wrong! idxL will be 19, which is correct. What's causing the problem is tha toUpperCase() converts the "ß" character into TWO characters, "SS" and this throws the index off.
Consequently, always stick with toLowerCase()
find
to "STRASSE"
, it doesn't find it at all in the lower case variant, but does correctly find it in the upper case version. –
Gower What are you doing with the index value once returned?
If you are using it to manipulate your string, then could you not use a regular expression instead?
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class StringIndexOfRegexpTest {
@Test
public void testNastyIndexOfBasedReplace() {
final String source = "Hello World";
final int index = source.toLowerCase().indexOf("hello".toLowerCase());
final String target = "Hi".concat(source.substring(index
+ "hello".length(), source.length()));
assertEquals("Hi World", target);
}
@Test
public void testSimpleRegexpBasedReplace() {
final String source = "Hello World";
final String target = source.replaceFirst("(?i)hello", "Hi");
assertEquals("Hi World", target);
}
}
@Test
public void testIndexofCaseSensitive() {
TestCase.assertEquals(-1, "abcDef".indexOf("d") );
}
Yes, I am fairly sure it is. One method of working around that using the standard library would be:
int index = str.toUpperCase().indexOf("FOO");
I've just looked at the source. It compares chars so it is case sensitive.
Had the same problem. I tried regular expression and the apache StringUtils.indexOfIgnoreCase-Method, but both were pretty slow... So I wrote an short method myself...:
public static int indexOfIgnoreCase(final String chkstr, final String searchStr, int i) {
if (chkstr != null && searchStr != null && i > -1) {
int serchStrLength = searchStr.length();
char[] searchCharLc = new char[serchStrLength];
char[] searchCharUc = new char[serchStrLength];
searchStr.toUpperCase().getChars(0, serchStrLength, searchCharUc, 0);
searchStr.toLowerCase().getChars(0, serchStrLength, searchCharLc, 0);
int j = 0;
for (int checkStrLength = chkstr.length(); i < checkStrLength; i++) {
char charAt = chkstr.charAt(i);
if (charAt == searchCharLc[j] || charAt == searchCharUc[j]) {
if (++j == serchStrLength) {
return i - j + 1;
}
} else { // faster than: else if (j != 0) {
i = i - j;
j = 0;
}
}
}
return -1;
}
According to my tests its much faster... (at least if your searchString is rather short). if you have any suggestions for improvement or bugs it would be nice to let me know... (since I use this code in an application ;-)
indexOfIgnoreCase("İ","i")
should return 0 because İ
is the correct capitalization of i
for Turkish text, but instead returns -1 because i
is capitalized to the more common I
). –
Gower Just to sum it up, 3 solutions:
- using toLowerCase() or toUpperCase
- using StringUtils of apache
- using regex
Now, what I was wondering was which one is the fastest? I'm guessing on average the first one.
The first question has already been answered many times. Yes, the String.indexOf()
methods are all case-sensitive.
If you need a locale-sensitive indexOf()
you could use the Collator. Depending on the strength value you set you can get case insensitive comparison, and also treat accented letters as the same as the non-accented ones, etc.
Here is an example of how to do this:
private int indexOf(String original, String search) {
Collator collator = Collator.getInstance();
collator.setStrength(Collator.PRIMARY);
for (int i = 0; i <= original.length() - search.length(); i++) {
if (collator.equals(search, original.substring(i, i + search.length()))) {
return i;
}
}
return -1;
}
I would like to lay claim to the ONE and only solution posted so far that actually works. :-)
Three classes of problems that have to be dealt with.
Non-transitive matching rules for lower and uppercase. The Turkish I problem has been mentioned frequently in other replies. According to comments in Android source for String.regionMatches, the Georgian comparison rules requires additional conversion to lower-case when comparing for case-insensitive equality.
Cases where upper- and lower-case forms have a different number of letters. Pretty much all of the solutions posted so far fail, in these cases. Example: German STRASSE vs. Straße have case-insensitive equality, but have different lengths.
Binding strengths of accented characters. Locale AND context effect whether accents match or not. In French, the uppercase form of 'é' is 'E', although there is a movement toward using uppercase accents . In Canadian French, the upper-case form of 'é' is 'É', without exception. Users in both countries would expect "e" to match "é" when searching. Whether accented and unaccented characters match is locale-specific. Now consider: does "E" equal "É"? Yes. It does. In French locales, anyway.
I am currently using android.icu.text.StringSearch
to correctly implement previous implementations of case-insensitive indexOf operations.
Non-Android users can access the same functionality through the ICU4J package, using the com.ibm.icu.text.StringSearch
class.
Be careful to reference classes in the correct icu package (android.icu.text
or com.ibm.icu.text
) as Android and the JRE both have classes with the same name in other namespaces (e.g. Collator).
this.collator = (RuleBasedCollator)Collator.getInstance(locale);
this.collator.setStrength(Collator.PRIMARY);
....
StringSearch search = new StringSearch(
pattern,
new StringCharacterIterator(targetText),
collator);
int index = search.first();
if (index != SearchString.DONE)
{
// remember that the match length may NOT equal the pattern length.
length = search.getMatchLength();
....
}
Test Cases (Locale, pattern, target text, expectedResult):
testMatch(Locale.US,"AbCde","aBcDe",true);
testMatch(Locale.US,"éèê","EEE",true);
testMatch(Locale.GERMAN,"STRASSE","Straße",true);
testMatch(Locale.FRENCH,"éèê","EEE",true);
testMatch(Locale.FRENCH,"EEE","éèê",true);
testMatch(Locale.FRENCH,"éèê","ÉÈÊ",true);
testMatch(new Locale("tr-TR"),"TITLE","tıtle",true); // Turkish dotless I/i
testMatch(new Locale("tr-TR"),"TİTLE","title",true); // Turkish dotted I/i
testMatch(new Locale("tr-TR"),"TITLE","title",false); // Dotless-I != dotted i.
PS: As best as I can determine, the PRIMARY binding strength should do the right thing when locale-specific rules differentiate between accented and non-accented characters according to dictionary rules; but I don't which locale to use to test this premise. Donated test cases would be gratefully appreciated.
--
Copyright notice: because StackOverflow's CC-BY_SA copyrights as applied to code-fragments are unworkable for professional developers, these fragments are dual licensed under more appropriate licenses here: https://pastebin.com/1YhFWmnU
But it's not hard to write one:
public class CaseInsensitiveIndexOfTest extends TestCase {
public void testOne() throws Exception {
assertEquals(2, caseInsensitiveIndexOf("ABC", "xxabcdef"));
}
public static int caseInsensitiveIndexOf(String substring, String string) {
return string.toLowerCase().indexOf(substring.toLowerCase());
}
}
"ı"
is a lower-case variant (just not the default one in most langauges) of "I"
. Or alternatively, if run on a machine set to a locale where "ı"
is the default, it will fail to notice that "i"
is also a lower-case variant of "I"
. –
Gower Converting both strings to lower-case is usually not a big deal but it would be slow if some of the strings is long. And if you do this in a loop then it would be really bad. For this reason, I would recommend indexOfIgnoreCase
.
static string Search(string factMessage, string b)
{
int index = factMessage.IndexOf(b, StringComparison.CurrentCultureIgnoreCase);
string line = null;
int i = index;
if (i == -1)
{ return "not matched"; }
else
{
while (factMessage[i] != ' ')
{
line = line + factMessage[i];
i++;
}
return line;
}
}
Here's a version closely resembling Apache's StringUtils version:
public int indexOfIgnoreCase(String str, String searchStr) {
return indexOfIgnoreCase(str, searchStr, 0);
}
public int indexOfIgnoreCase(String str, String searchStr, int fromIndex) {
// https://mcmap.net/q/127729/-string-contains-ignore-case-duplicate/14018511
if(str == null || searchStr == null) return -1;
if (searchStr.length() == 0) return fromIndex; // empty string found; use same behavior as Apache StringUtils
final int endLimit = str.length() - searchStr.length() + 1;
for (int i = fromIndex; i < endLimit; i++) {
if (str.regionMatches(true, i, searchStr, 0, searchStr.length())) return i;
}
return -1;
}
indexOf is case sensitive. This is because it uses the equals method to compare the elements in the list. The same thing goes for contains and remove.
© 2022 - 2024 — McMap. All rights reserved.