Commons Lang StringUtils.replace performance vs String.replace
Asked Answered
W

5

51

When I compared performance of Apache's StringUtils.replace() vs String.replace() I was surprised to know that the former is about 4 times faster. I used Google's Caliper framework to measure performance. Here's my test

public class Performance extends SimpleBenchmark {
    String s = "111222111222";

    public int timeM1(int n) {
        int res = 0;
        for (int x = 0; x < n; x++) {
            res += s.replace("111", "333").length();
        }
        return res;
    }

    public int timeM2(int n) {
        int res = 0;
        for (int x = 0; x < n; x++) {
            res += StringUtils.replace(s, "111", "333", -1).length();
        }
        return res;
    }

    public static void main(String... args) {
        Runner.main(Performance.class, args);
    }
}

output

 0% Scenario{vm=java, trial=0, benchmark=M1} 9820,93 ns; ?=1053,91 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M2} 2594,67 ns; ?=58,12 ns @ 10 trials

benchmark   us linear runtime
       M1 9,82 ==============================
       M2 2,59 =======

Why is that? Both methods seem to do the same work, StringUtils.replace() is even more flexible.

Willms answered 26/4, 2013 at 4:51 Comment(5)
Have you looked at the source code?Reminiscence
One uses a regex, which is expensive; the other doesn't?Cullie
I wonder why java version still has not been tuned since introduction in java 5. That internal use of regexp is so obviously inefficient!Hydrophilous
yes but when the string is very large ex.10kb then the former one takes more time than later one. i.e I found the StringUtils.replace works slower than String.replace (note: i observed this with 10kb payload ).Capablanca
Fixed in bugs.openjdk.java.net/browse/JDK-8058779, starting with JDK 9.Rabato
L
44

From the source code of java.lang.String1:

public String replace(CharSequence target, CharSequence replacement) {
   return Pattern
            .compile(target.toString(), Pattern.LITERAL)
            .matcher(this )
            .replaceAll(
                    Matcher.quoteReplacement(replacement.toString()));
}

String.replace(CharSequence target, CharSequence replacement) is implemented with java.util.regex.Pattern, therefore, it is not surprising that it is slower that StringUtils.replace(String text, String searchString, String replacement)2, which is implemented with indexOf and StringBuffer.

public static String replace(String text, String searchString, String replacement) {
    return replace(text, searchString, replacement, -1);
}

public static String replace(String text, String searchString, String replacement, int max) {
    if (isEmpty(text) || isEmpty(searchString) || replacement == null || max == 0) {
        return text;
    }
    int start = 0;
    int end = text.indexOf(searchString, start);
    if (end == -1) {
        return text;
    }
    int replLength = searchString.length();
    int increase = replacement.length() - replLength;
    increase = (increase < 0 ? 0 : increase);
    increase *= (max < 0 ? 16 : (max > 64 ? 64 : max));
    StringBuffer buf = new StringBuffer(text.length() + increase);
    while (end != -1) {
        buf.append(text.substring(start, end)).append(replacement);
        start = end + replLength;
        if (--max == 0) {
            break;
        }
        end = text.indexOf(searchString, start);
    }
    buf.append(text.substring(start));
    return buf.toString();
}

Footnote

1 The version that I links to and copied source code from is JDK 7

2 The version that I links to and copied source code from is common-lang-2.5

Lifeguard answered 26/4, 2013 at 5:6 Comment(10)
Surprising that they use StringBuffer. It would almost certainly be even faster with StringBuilder in its place.Leuko
@pickypg: Probably for compatibility reason? StringBuffer is available from Java 1.0, while StringBuilder is only available from Java 1.5. I think they do it for compatibility reason in old releases and didn't change it for the new release (where they raised the requirement to 1.5 and later 1.6).Lifeguard
You appear to be right. The legacy version (2.6) supports Java 1.2. It wasn't until 3.1 they required Java 1.5. In the latest, they have changed over to StringBuilders. Thought they required 1.5 since version 2.x.Leuko
In many cases biased locking means the practical difference between StringBuilder and StringBuffer is close to nil, unless you are appending one character at a time or something.Salmanazar
@BeeOnRope: Right. From my testing when I commented on Stephen C's post (the test is not included here), there is very minor difference (that is not distinguishable from noise) between StringBuilder and StringBuffer's timing.Lifeguard
It's interesting that the HotSpot JVM and JDK guys found two completely different solutions to the mistake of introducing basic classes like Vector and StringBuffer being synchronized by default - the first to introduce new classes in the JDK without synchronization - which requires end-user effort to update, but then a few years later the "magic" of biased locking comes along and means those who didn't convert mostly don't pay a large penalty. Even the underlying atomic instructions have been getting much faster on x86 for the cases where biased locking doesn't kick in.Salmanazar
It might be worth noting that String.replace(char, char) does not use regex in its implementationJugglery
@RTF: Yes, but it is irrelevant to the issue here (string replacement).Lifeguard
Starting with Java 9, String.replace will be faster than before, see bugs.openjdk.java.net/browse/JDK-8058779.Rabato
@RolandIllig: Yes, the new implementation should be faster for most practical cases. However, I think the new implementation will have quadratic complexity on worst case, while the regex implementation won't have this problem.Lifeguard
P
64

In modern Java, this is not the case anymore. String.replace was improved in Java-9 moving from regular expression to StringBuilder, and was improved even more in Java-13 moving to direct allocation of the target byte[] array calculating its exact size in advance. Thanks to internal JDK features used, like the ability to allocate an uninitialized array, ability to access String coder and ability to use private String constructor which avoids copying, it's unlikely that current implementation can be beaten by a third-party implementation.

Here are my benchmarking results for your test using JDK 8, JDK 9 and JDK 13 (caliper:0.5-rc1; commons-lang3:3.9)

Java 8 (4x slower indeed):

 0% Scenario{vm=java, trial=0, benchmark=M1} 291.42 ns; σ=6.56 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M2} 70.34 ns; σ=0.15 ns @ 3 trials

benchmark    ns linear runtime
       M1 291.4 ==============================
       M2  70.3 =======

Java 9 (almost equal performance):

 0% Scenario{vm=java, trial=0, benchmark=M2} 99,15 ns; σ=8,34 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M1} 103,43 ns; σ=9,01 ns @ 10 trials

benchmark    ns linear runtime
       M2  99,1 ============================
       M1 103,4 ==============================

Java 13 (standard method is 38% faster):

 0% Scenario{vm=java, trial=0, benchmark=M2} 91,64 ns; σ=5,12 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M1} 57,38 ns; σ=2,51 ns @ 10 trials

benchmark   ns linear runtime
       M2 91,6 ==============================
       M1 57,4 ==================
Poland answered 2/10, 2019 at 10:40 Comment(1)
My first thought was, you gain for simple cases, but lose for more complex scenarios where the regex engine’s use of the Boyer–Moore algorithm would pay off, but then, I noticed your other answer stating that indexOf(String) is an intrinsic operation. Now, it would be interesting whether being called from String.replace counts as a use with a dynamic argument or whether the caller’s caller is taken into account here…Cumshaw
L
44

From the source code of java.lang.String1:

public String replace(CharSequence target, CharSequence replacement) {
   return Pattern
            .compile(target.toString(), Pattern.LITERAL)
            .matcher(this )
            .replaceAll(
                    Matcher.quoteReplacement(replacement.toString()));
}

String.replace(CharSequence target, CharSequence replacement) is implemented with java.util.regex.Pattern, therefore, it is not surprising that it is slower that StringUtils.replace(String text, String searchString, String replacement)2, which is implemented with indexOf and StringBuffer.

public static String replace(String text, String searchString, String replacement) {
    return replace(text, searchString, replacement, -1);
}

public static String replace(String text, String searchString, String replacement, int max) {
    if (isEmpty(text) || isEmpty(searchString) || replacement == null || max == 0) {
        return text;
    }
    int start = 0;
    int end = text.indexOf(searchString, start);
    if (end == -1) {
        return text;
    }
    int replLength = searchString.length();
    int increase = replacement.length() - replLength;
    increase = (increase < 0 ? 0 : increase);
    increase *= (max < 0 ? 16 : (max > 64 ? 64 : max));
    StringBuffer buf = new StringBuffer(text.length() + increase);
    while (end != -1) {
        buf.append(text.substring(start, end)).append(replacement);
        start = end + replLength;
        if (--max == 0) {
            break;
        }
        end = text.indexOf(searchString, start);
    }
    buf.append(text.substring(start));
    return buf.toString();
}

Footnote

1 The version that I links to and copied source code from is JDK 7

2 The version that I links to and copied source code from is common-lang-2.5

Lifeguard answered 26/4, 2013 at 5:6 Comment(10)
Surprising that they use StringBuffer. It would almost certainly be even faster with StringBuilder in its place.Leuko
@pickypg: Probably for compatibility reason? StringBuffer is available from Java 1.0, while StringBuilder is only available from Java 1.5. I think they do it for compatibility reason in old releases and didn't change it for the new release (where they raised the requirement to 1.5 and later 1.6).Lifeguard
You appear to be right. The legacy version (2.6) supports Java 1.2. It wasn't until 3.1 they required Java 1.5. In the latest, they have changed over to StringBuilders. Thought they required 1.5 since version 2.x.Leuko
In many cases biased locking means the practical difference between StringBuilder and StringBuffer is close to nil, unless you are appending one character at a time or something.Salmanazar
@BeeOnRope: Right. From my testing when I commented on Stephen C's post (the test is not included here), there is very minor difference (that is not distinguishable from noise) between StringBuilder and StringBuffer's timing.Lifeguard
It's interesting that the HotSpot JVM and JDK guys found two completely different solutions to the mistake of introducing basic classes like Vector and StringBuffer being synchronized by default - the first to introduce new classes in the JDK without synchronization - which requires end-user effort to update, but then a few years later the "magic" of biased locking comes along and means those who didn't convert mostly don't pay a large penalty. Even the underlying atomic instructions have been getting much faster on x86 for the cases where biased locking doesn't kick in.Salmanazar
It might be worth noting that String.replace(char, char) does not use regex in its implementationJugglery
@RTF: Yes, but it is irrelevant to the issue here (string replacement).Lifeguard
Starting with Java 9, String.replace will be faster than before, see bugs.openjdk.java.net/browse/JDK-8058779.Rabato
@RolandIllig: Yes, the new implementation should be faster for most practical cases. However, I think the new implementation will have quadratic complexity on worst case, while the regex implementation won't have this problem.Lifeguard
B
11

Try this one, you'll notice that it's extremely performant than Apache's one:

public static String replace (String source, String os, String ns) {
    if (source == null) {
        return null;
    }
    int i = 0;
    if ((i = source.indexOf(os, i)) >= 0) {
        char[] sourceArray = source.toCharArray();
        char[] nsArray = ns.toCharArray();
        int oLength = os.length();
        StringBuilder buf = new StringBuilder (sourceArray.length);
        buf.append (sourceArray, 0, i).append(nsArray);
        i += oLength;
        int j = i;
        // Replace all remaining instances of oldString with newString.
        while ((i = source.indexOf(os, i)) > 0) {
            buf.append (sourceArray, j, i - j).append(nsArray);
            i += oLength;
            j = i;
        }
        buf.append (sourceArray, j, sourceArray.length - j);
        source = buf.toString();
        buf.setLength (0);
    }
    return source;
}
Bucella answered 3/10, 2013 at 15:48 Comment(1)
It is slightly faster than Apache ones in short text replacement, but not necessarily faster than other implementations in long text replacement senario. Checkout github.com/greenlaw110/Benchmark4StringReplace in which Fast implementation is done using your function aboveDasha
C
6

on my test with JMH:https://github.com/qxo/Benchmark4StringReplace The beset is loukili's way:

java -jar target/benchmarks.jar StringReplaceBenchmark -wi 3 -i 6 -f 1 -tu ms Benchmark Mode Cnt Score Error Units StringReplaceBenchmark.test4String thrpt 6 1255.017 ± 230.012 ops/ms StringReplaceBenchmark.test4StringUtils thrpt 6 4068.229 ± 67.708 ops/ms StringReplaceBenchmark.test4fast thrpt 6 4821.035 ± 97.790 ops/ms StringReplaceBenchmark.test4lang3StringUtils thrpt 6 3186.007 ± 102.786 ops/ms

Crossfade answered 29/5, 2017 at 2:19 Comment(1)
good project. I've cloned your project, added OSGL implementation and more test scenarios. I've submitted a PR to youDasha
R
3

Why is that? Both methods seem to do the same work.

You would need to look at the source-code and do some serious investigation with a profiler to get a good (technical) answer to that.

However, one possible explanation is that StringUtils.replace and String.replace have been tuned for different use-cases. You are only looking at one case ... with a pretty small string, and a replacement string that is the same size as the substring being replaced.

Another possible explanation is that the Apache developers simply spent more time on tuning. (And lets not blame the Java developers for that. They have been working under severe staffing constraints for a long time. In the big scheme of things, there are many tasks more important than performance tuning String.replace.)


In fact, looking at the source code, it looks like the Java 7 version just uses the regular expression-based replace under the hood. By contrast, the Apache version is going to considerable lengths to avoid copying. Based on that evidence, I'd expect the performance difference between the two versions to be relatively smaller for large target strings. And I suspect the Java 7 version might even be better in some cases.

(Either non-technical explanation is plausible too, based on the evidence in the code.)

Reminiscence answered 26/4, 2013 at 5:0 Comment(3)
Based on that evidence, I'd expect the performance difference between the two versions to be relatively smaller for large target strings. There are several factors affecting the time of a replacement function: length of input, length of target string, length of replacement string, number of matches. Here, you only mention about the size of string, but how about the other parameters (esp. number of matches)?Lifeguard
You'd need to look at how Matcher.replaceAll works. Personally, I'd expect the number of matches to not be that significant. I'd be more concerned about whether the replacement process required expansion of a StringBuilder or not. And that depends on whether the replacement increases the string length ... or not. (But I haven't looked ... )Reminiscence
I haven't looked at the details, but from testing, StringUtils outperforms String.replace by 2-3 times for random string (I varied the number of matches, and also the length of the target string, the replacement string is allowed to be slightly shorter or longer than the target string).Lifeguard

© 2022 - 2024 — McMap. All rights reserved.