Java regex replaceAll multiline
Asked Answered
G

3

57

I have a problem with the replaceAll for a multiline string:

String regex = "\\s*/\\*.*\\*/";
String testWorks = " /** this should be replaced **/ just text";
String testIllegal = " /** this should be replaced \n **/ just text";

testWorks.replaceAll(regex, "x"); 
testIllegal.replaceAll(regex, "x"); 

The above works for testWorks, but not for testIllegal!? Why is that and how can I overcome this? I need to replace something like a comment /* ... */ that spans multiple lines.

Griffon answered 11/11, 2010 at 12:10 Comment(2)
And what about this string: "String s = \"/*\"; /* comment */"Bruns
Well the point is that the mathing regex should match only in the beginning of the string. Now it looks like this:(?s)^\\s*/\*.*\*/ Not sure though, if to make it reluctant (?s)^\\s*/\*.*?\*/Griffon
D
100

You need to use the Pattern.DOTALL flag to say that the dot should match newlines. e.g.

Pattern.compile(regex, Pattern.DOTALL).matcher(testIllegal).replaceAll("x")

or alternatively specify the flag in the pattern using (?s) e.g.

String regex = "(?s)\\s*/\\*.*\\*/";
Donley answered 11/11, 2010 at 12:17 Comment(2)
This is the best solution because it does not interact with the regex string itself, you just specify a flag. I did not know that, Thanks!Griffon
If you have multiple "multi-line" comments, this method will remove text between those comments as well. Use the method posted by Boris instead.Cattycornered
F
16

Add Pattern.DOTALL to the compile, or (?s) to the pattern.

This would work

String regex = "(?s)\\s*/\\*.*\\*/";

See Match multiline text using regular expression

Florist answered 11/11, 2010 at 12:17 Comment(1)
Unfortunately, this does not work in combination with String.replaceAll. :(Hereford
N
7

The meta character . matches any character other than newline. That is why your regex does not work for multi line case.

To fix this replace . with [\d\D] that matches any character including newline.

Code In Action

Nanine answered 11/11, 2010 at 12:17 Comment(1)
Swapping in [\d\D] for . (which normally means [^\n], at least in Pattern.UNIX_LINES mode) strikes me as inappropriate because it is not obvious what it is doing, because it is 6 chars for 1, and because there are other ways of doing this.Florist

© 2022 - 2024 — McMap. All rights reserved.