String replaceAll() vs. Matcher replaceAll() (Performance differences)
Asked Answered
F

7

64

Are there known difference(s) between String.replaceAll() and Matcher.replaceAll() (On a Matcher Object created from a Regex.Pattern) in terms of performance?

Also, what are the high-level API 'ish differences between the both? (Immutability, Handling NULLs, Handling empty strings, etc.)

Feverish answered 23/9, 2009 at 16:0 Comment(0)
H
103

According to the documentation for String.replaceAll, it has the following to say about calling the method:

An invocation of this method of the form str.replaceAll(regex, repl) yields exactly the same result as the expression

Pattern.compile(regex).matcher(str).replaceAll(repl)

Therefore, it can be expected the performance between invoking the String.replaceAll, and explicitly creating a Matcher and Pattern should be the same.

Edit

As has been pointed out in the comments, the performance difference being non-existent would be true for a single call to replaceAll from String or Matcher, however, if one needs to perform multiple calls to replaceAll, one would expect it to be beneficial to hold onto a compiled Pattern, so the relatively expensive regular expression pattern compilation does not have to be performed every time.

Hollington answered 23/9, 2009 at 16:6 Comment(10)
except, as mentioned below, the performance penalty of the pattern compilition. if you are using a constant regex, compile it and stick it in a static constant.Crossway
Your "Therefore" comment at the end only applies for 1 call, in which case performance metrics really aren't relevant. If there are repeated calls to replaceAll with the same regex then String.replaceAll is slower than caching a compiled pattern.Tweedy
Does anyone know if the regex String is static, are any javac compilers smart enough to figure out that the Pattern object can be static too and automatically build a static field into the generated bytecode? Sounds like a great way to boost performance on code while improving readability.Aeonian
I compared performance for String.replace() and the Matcher.replaceFirst() and the String version is significantly faster.Leeway
Actually, with repeated use, holding onto the Matcher is even better. Create it with Pattern.compile(...).matcher("ignored input"), then use it with theMatcher.reset(theString).replaceAll(...)Camp
@Camp you should make that an answer. I was reusing the pattern until I read what you wrote.Emulation
@Emulation i've since changed my mind on this. Reusing the matcher is indeed a tiny bit more efficient, but it is much more clunky, especially because it requires the unused/dummy string. Reusing the pattern is your best bet.Camp
@Camp that's really interesting, because it seems to make a significant difference in ms runtime when the replaceAll() is being called continually inside of an O(n^3) algorithm. I plan to stay with the matcher until I can look into the implementation code and see what the actual difference is.Emulation
@Camp FWIW holding on to the Matcher is NOT threadsafe. We found this out the hard way in production and caused some corrupt XML strings. See javamex.com/tutorials/regular_expressions/thread_safety.shtmlClostridium
@Clostridium I think the best here is to store Pattern to a constant and keep Matcher being created in local scopePauli
S
31

Source code of String.replaceAll():

public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

It has to compile the pattern first - if you're going to run it many times with the same pattern on short strings, performance will be much better if you reuse one compiled Pattern.

Shy answered 23/9, 2009 at 16:8 Comment(0)
E
10

The main difference is that if you hold onto the Pattern used to produce the Matcher, you can avoid recompiling the regex every time you use it. Going through String, you don't get the ability to "cache" like this.

If you have a different regex every time, using the String class's replaceAll is fine. If you are applying the same regex to many strings, create one Pattern and reuse it.

Edgeworth answered 23/9, 2009 at 16:7 Comment(3)
Patching up your answer to repeat what I've already said is lame.Edgeworth
If that was aimed at me for some reason, I suspect I was already editing by the time you posted your answer...Threepiece
Actually, it was aimed at coobird.Edgeworth
T
6

Immutability / thread safety: compiled Patterns are immutable, Matchers are not. (see Is Java Regex Thread Safe?)

Handling empty strings: replaceAll should handle empty strings gracefully (it won't match an empty input string pattern)

Making coffee, etc.: last I heard, neither String nor Pattern nor Matcher had any API features for that.

edit: as for handling NULLs, the documentation for String and Pattern doesn't explicitly say so, but I suspect they'd throw a NullPointerException since they expect a String.

Tweedy answered 23/9, 2009 at 16:15 Comment(0)
T
5

The implementation of String.replaceAll tells you everything you need to know:

return Pattern.compile(regex).matcher(this).replaceAll(replacement);

(And the docs say the same thing.)

While I haven't checked for caching, I'd certainly expect that compiling a pattern once and keeping a static reference to that would be more efficient than calling Pattern.compile with the same pattern each time. If there's a cache it'll be a small efficiency saving - if there isn't it could be a large one.

Threepiece answered 23/9, 2009 at 16:5 Comment(0)
H
5

The difference is that String.replaceAll() compiles the regex each time it's called. There's no equivalent for .NET's static Regex.Replace() method, which automatically caches the compiled regex. Usually, replaceAll() is something you do only once, but if you're going to be calling it repeatedly with the same regex, especially in a loop, you should create a Pattern object and use the Matcher method.

You can create the Matcher ahead of time, too, and use its reset() method to retarget it for each use:

Matcher m = Pattern.compile(regex).matcher("");
for (String s : targets)
{
  System.out.println(m.reset(s).replaceAll(repl));
}

The performance benefit of reusing the Matcher, of course, is nowhere as great as that of reusing the Pattern.

Hyperthyroidism answered 23/9, 2009 at 16:35 Comment(0)
C
0

The other answers sufficiently cover the performance part of the OP, but another difference between Matcher::replaceAll and String::replaceAll is also a reason to compile your own Pattern. When you compile a Pattern yourself, there are options like flags to modify how the regex is applied. For example:

Pattern myPattern = Pattern.compile(myRegex, Pattern.CASE_INSENSITIVE);

The Matcher will apply all the flags you set when you call Matcher::replaceAll.

There are other flags you can set as well. Mostly I just wanted to point out that the Pattern and Matcher API has lots of options, and that's the primary reason to go beyond the simple String::replaceAll

Caridadcarie answered 1/12, 2016 at 18:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.