regex: How to escape backslashes and special characters?
Asked Answered
R

1

14

Is there a way to escape ( or protect ) special characters in a regular expression?

What I would like to do is to create a simple regex tester:

import java.util.regex.*;
class TestRegex { 
   public static void main( String ... args ) { 
       System.out.printf("%s ~= %s ? %s  %n" , args[0], args[1], Pattern.matches( args[0], args[1] ) );
   }
}

Which works great to test my patterns before plug-in them into the program:

$java TestRegex "\d" 1
\d ~= 1 ? true  
$java TestRegex "\d" 12
\d ~= 12 ? false  
$java TestRegex "\d+" 12
\d+ ~= 12 ? true  
$java TestRegex "\d+" a12
\d+ ~= a12 ? false  
$java TestRegex "\d+" ""
\d+ ~=  ? false  

The next thing I do is to use this pattern in my program, but each time I have to manually escape it:

Pattern p = Pattern.compile( /*copy pasted regex here */ );

And in this sample, substitute: \d with \\d. After a while this becomes very irritating .

Q. How can I automatically escape these special characters?

Richel answered 11/1, 2011 at 3:10 Comment(0)
B
28

You just need to replace all single backslashes with double backslashes. This is complicated a bit since the replaceAll function on String really executes a regular expression and you have to first escape the backslash because it's a literal (yielding \\), and then escape it again because of the regular expression (yielding \\\\). The replacement suffers a similar fate and requires two such escape sequences making it a total of 8 backslashes:

System.out.printf("%s ~= %s ? %s  %n", 
    args[0].replaceAll("\\\\","\\\\\\\\"), args[1], ...
Breastwork answered 11/1, 2011 at 3:15 Comment(6)
I get: Exception in thread "main" java.util.regex.PatternSyntaxException: Unexpected internal error near index 1 pastebin.com/aEWSibXvRichel
It's moments like these that I wish Java had a better syntax for literal Strings.Omidyar
@Oscar: oops, you have to escape once for a String literal and another time because replaceAll is in itself a regular expression. Fixed now.Breastwork
I was a couple of \ far.. I was trying several combinations :) What about special char, like + { ( etc. ? I don't think I need to scape those do I?Richel
@Oscar: nope, the only literals that would cause you problems are the backslashes. Ironically the error I made is identical to the one you're trying to solve.Breastwork
Which pretty much probes the point ;) Woks like charm thanks: pastebin.com/y0mePwV5Richel

© 2022 - 2024 — McMap. All rights reserved.