Java: how String.contains(string) function works in java?

Asked 25/8, 2013 at 23:6 Answered 6/12, 2023 at 12:27

I know the brute force approach with the time complexity of n*m (m is length of first string and n is the length of the other one) for testing if an string contains another one, however, I'm wondering to know if there is better solution for it?

boolean contains(String input,String search)

Duckling answered 25/8, 2013 at 23:6 Comment(0)

You can look at the source:

public boolean contains(CharSequence s) {
    return indexOf(s.toString()) > -1;
}

Parfleche answered 25/8, 2013 at 23:9 Comment(1)

I was about to say you should provide the source for indexOf as well...but that's the OP's job isn't it ;) – Kinelski 25/8, 2013 at 23:11

I'm wondering to know if there is better solution for it?

There are lots of algorithms for simple string searching; see the Wikipedia String Search page. The page includes complexity characterizations ... and references.

The standard Java java.lang.String implementation uses naive search under the hood. Some of the algorithms on the Wikipedia page that have better complexity for the search phase, but require non-trivial setup. I expect that the Sun/Oracle engineers have done extensive empirical testing and found that naive search works best on average over a wide range of real-world applications.

Finally ...

I know the brute force approach with the time complexity of O(n*m)

In fact, that is the worst case complexity. The average complexity is O(n). Consider this:

boolean bruteForceMatch (String str1, String str2) {
    for (int i = 0; i < str.length(); i++) {
        boolean matches = true;
        for (int j = 0; j < str2.length(); j++) {
            if (i + j >= str.length || 
                str1.charAt(i + j) != str2.charAt(j)) {
                matched = false;
                break;
            }
        }
        if (matched) {
            return true;
        }
    }
    return false;
}

The worst case happens with inputs like "AAA..." and "AAA...B". (The dots denote repetition.)

But in the average case (e.g. randomly generated input strings) you won't have an "almost match" for str2 at every position of str1. The inner loop will typically break in the iteration.

Gati answered 25/8, 2013 at 23:40 Comment(0)

Is there a better way? Depends on what you think is 'better'. An alternative is using Pattern. But still, what would be the difference in user experience? Would it be relevant enough?

If you really want to use the best option, try both options out for yourself with enough iterations.

Palaeontology answered 25/8, 2013 at 23:12 Comment(1)

Using Dynamic programming for example, to get smaller time complexity. – Duckling 25/8, 2013 at 23:13

here is my solution:

static boolean contain(String input,String search)
{
    int[] searchIn = new int[search.length()];
    searchIn[0] = 0;
            //searchIn keep track of repeated values on search sting
            //so if the search string is "abcabd" then the corresponding searchIn is
            //0 0 0 1 2 0
    int k = 0;
    for(int i=1;i<search.length();i++)
    {
        if(search.charAt(i)== search.charAt(k))
        {
            searchIn[i] = ++k; 
        }
        else 
        {
            k =0;
            searchIn[i] = k;
        }       
    }
    int i=0;
    int j=0;

    while(i<=input.length()-1 && j<=search.length()-1)
    {
        if(input.charAt(i) == search.charAt(j))
        {
            i++;
            j++;
        }
        else
        {
            j = searchIn[j-1];
            if(i==input.length()-1)
                i++;
        }
    }
    if(j==search.length())
        return true;
    else return false;
}

Duckling answered 25/8, 2013 at 23:31 Comment(5)

sweet ;) Have you any chance measured the performance difference? – Palaeontology 25/8, 2013 at 23:38

linear, should be O(m+n) – Duckling 25/8, 2013 at 23:40

It looks to me like it is O(m * n) on average. And my reading of the code does not convince me that it is correct. It would help if you included an explanation of what the code is doing, and how/why the searchIn table is giving you a speedup. FWIW, it looks to me like it attempts to optimize the case where there are repeated letters in the search string ... and I'm not convinced that the optimization is correct. – Gati 26/8, 2013 at 0:27

@StephenC I added some comments. I tested the code and it works. It should be OK. – Duckling 26/8, 2013 at 3:9

@sweet - That's not what I'd call an explanation, and "I've tested it so it is OK" doesn't inspire confidence either. Finally, you've presented no evidence for your assertion about complexity. Basically, you are presenting what appears to be a "new" algorithm, and this requires due diligence if you expect people to accept it as being correct. And if you are not prepared to do the work, I don't see the point of making your claim. (But I retract my O(m*n) on average ...) – Gati 26/8, 2013 at 3:53

For peeps who don't wish to sieve through multiple calls from one Class to another, from one method to another, String.contains(String str) at this moment works as follows

public boolean contains(CharSequence s) {
    return indexOf(s.toString()) >= 0;
}
public int indexOf(String str) {
    byte coder = coder();
    if (coder == str.coder()) {
        return isLatin1() ? StringLatin1.indexOf(value, str.value) :
            StringUTF16.indexOf(value, str.value);
    }
    if (coder == LATIN1) { // str.coder == UTF16
        return -1;
    }
    return StringUTF16.indexOfLatin1(value, str.value);
}
// Class StringLatin1 below
@IntrinsicCandidate
public static int indexOf(byte[] value, byte[] str) {
    if (str.length == 0) {
        return 0;
    }
    if (value.length == 0) {
        return -1;
    }
    return indexOf(value, value.length, str, str.length, 0);
}

@IntrinsicCandidate
public static int indexOf(byte[] value, int valueCount, byte[] str, int strCount, int fromIndex) {
    byte first = str[0];
    int max = (valueCount - strCount);
    for (int i = fromIndex; i <= max; i++) {
        // Look for first character.
        if (value[i] != first) {
            while (++i <= max && value[i] != first);
        }
        // Found first character, now look at the rest of value
        if (i <= max) {
            int j = i + 1;
            int end = j + strCount - 1;
            for (int k = 1; j < end && value[j] == str[k]; j++, k++);
            if (j == end) {
                // Found whole string.
                return i;
            }
        }
    }
    return -1;
}

Nonillion answered 6/12, 2023 at 12:27 Comment(0)

Recommended topics

Hot tags