Raw Strings in Java - for regex in particular. Multiline strings
Asked Answered
Y

13

93

Is there any way to use raw strings in Java (without escape sequences)?

(I'm writing a fair amount of regex code and raw strings would make my code immensely more readable)

I understand that the language does not provide this directly, but is there any way to "simulate" them in any way whatsoever?

Yeorgi answered 10/8, 2009 at 19:12 Comment(7)
Oh, I want that so much. Multi-line strings, too. And maybe simple interpolation.Speller
Although you're not going to like this--I think it just encourages mixing your data with your code. The nicest thing about REGEXes are that they ARE data and can therefore be extracted into an indexed table of some sort, simplifying all the rest of your code. Changes in your information doesn't require a recompile then, just have your customer edit your REGEX source files. This is true of just about anything I'd consider multi-line strings for.. Always better external (if nothing else, think i18n!)Grane
ps. When I was young a smart programmer theorized that the only constants inline in your code should be 0 and 1, and those only used as loop termination/compare situations which are mostly no longer valid (we can use foreach instead of for(0..)) I thought he was nuts at the time, but the better I get, the smarter that theory sounds.Grane
Note (Jan. 2018), raw string literals might be coming for Java (JDK 10 or more): see In Java, is there a way to write a string literal without having to escape quotes?.Hypoploid
Situation is changed and nowdays answer that is marked as correct is wrong. Correct answer is given by Vlad. About text blocks. Please consider change because this decision confuses people.Iniquitous
@Myshkin Thanks for the heads up. I've updated the correct answer to Vlad's.Yeorgi
The newly updated correct answer is incorrect, since it won't actually work for regular expression non-escape sequences like \d. It tries to treat it as an escape sequence.Auten
D
15

Yes.

Text Blocks Come to Java

Java 13 delivers long-awaited multiline strings

Some history: Raw String Literals were withdrawn. This was intended to be a preview language feature in JDK 12, but it was withdrawn and did not appear in JDK 12. It was superseded by Text Blocks (JEP 355) in JDK 13.

You can use text blocks to define multiline String literals with ease. You don’t need to add the visual clutter that comes with regular String literals: concatenation operators and escape sequences. You can also control how the String values are formatted. For example, let’s look at the following HTML snippet:

String html = """
<HTML>
  <BODY>
    <H1>"Java 13 is here!"</H1>
  </BODY>
</HTML>""";

Notice the three quotation marks that delimit the beginning and ending of the block.

Destined answered 14/3, 2020 at 1:27 Comment(5)
This accepted answer is incorrect. The question was about using raw strings for regular expressions in particular. Try the following (with appropriate newlines) String regex = """ hello\d """; and the compiler with complain that \d is an illegal escape character.Auten
Escape Sequences in Text Blocks You can add various escape sequences to text blocks just as you would add them to your String literals. For instance, you can include new lines in your text blocks by placing the values on multiple lines or by using escape sequences such as \n. As expected, invalid escape sequences or unescaped backslashes are not allowed. \d is an illegal escape characterDestined
@josh waxman. Initial question didn't contain Regex - it was added later by some editor. Author of question asked about raw strings - just review history of questionDestined
yes, I know about newlines and other escape characters. Regarding the initial question, that is not the case - please review the history. The initial author of the question DID specifically ask about regexes, stating in the q body, "(I'm writing a fair amount of regex code and raw strings would make my code immensely more readable)". The edit you reference was adding "for regex in particular" to the q title, rather than the q body, for the sake of clarity. The questioner wanted to not escape \d as \\d.Auten
Meanwhile, this is something I too wanted, so I finally figured it out and just now posted my approach as a separate answer...Auten
F
49

This is a work-around if you are using eclipse. You can automatically have long blocks of text correctly multilined and special characters automatically escaped when you paste text into a string literal

"-paste here-";

if you enable that option in window→preferences→java→Editor→Typing→"Escape text when pasting into a string literal"

Frowsty answered 20/10, 2010 at 12:39 Comment(2)
This is awesome. I wish I would known about this feature sooner!Caudle
Works in Netbeans too.Venose
P
41

No, there isn't.

Generally, you would put raw strings and regexes in a properties file, but those have some escape sequence requirements too.

Peppi answered 10/8, 2009 at 19:20 Comment(1)
See my answer for this question. There is a way for it, now. https://mcmap.net/q/224351/-raw-strings-in-java-for-regex-in-particular-multiline-stringsTitivate
T
31

I use Pattern.quote. And it solves the problem of the question. Thusly:

Pattern pattern = Pattern.compile(Pattern.quote("\r\n?|\n"));

The quote method returns a string that would match the provided string argument, which the return string is the properly quoted string for our case.

Titivate answered 20/4, 2013 at 9:46 Comment(6)
Note this won't work if the escaped characters aren't valid scape sequences for Java string literals but are valid for regexes, for example: "\.".Cabinda
That's clever, but....aaaaargh. What a hacky solution to what should be a non-problem in a modern language. I'm not even sure it's worthwhile based on ygormutti's observation.Pacesetter
@KyleStrand This is NOT a hacky solution. Pattern.quote would be required even if Java had raw string literals: characters like . and + don't require any special treatment in Java string literals, still they need to be escaped for regular expressions. Python supports raw string literals, but it still has re.escape.Granada
@AlexShesterov Escaped special characters in Regex are still part of the regex expression passed to the regular expression engine. That is to say, the regex engine receives a literal \* sequence. The lack of raw strings in Java conflates the concept of creating a regex pattern with special characters treated as literals and the concept of creating string data with special characters. These are separate concepts.Pacesetter
Anyway this solves my problem: now foo("\\[") can be foo("[") happily.Redfield
"Thusly"... wauw! :PHectocotylus
K
18

No (quite sadly).

Kopaz answered 10/8, 2009 at 19:19 Comment(1)
This is the first answer on SO that I've seen that has gotten so many upvotes by just exploiting the emotions of Java programmers xDHectocotylus
D
15

Yes.

Text Blocks Come to Java

Java 13 delivers long-awaited multiline strings

Some history: Raw String Literals were withdrawn. This was intended to be a preview language feature in JDK 12, but it was withdrawn and did not appear in JDK 12. It was superseded by Text Blocks (JEP 355) in JDK 13.

You can use text blocks to define multiline String literals with ease. You don’t need to add the visual clutter that comes with regular String literals: concatenation operators and escape sequences. You can also control how the String values are formatted. For example, let’s look at the following HTML snippet:

String html = """
<HTML>
  <BODY>
    <H1>"Java 13 is here!"</H1>
  </BODY>
</HTML>""";

Notice the three quotation marks that delimit the beginning and ending of the block.

Destined answered 14/3, 2020 at 1:27 Comment(5)
This accepted answer is incorrect. The question was about using raw strings for regular expressions in particular. Try the following (with appropriate newlines) String regex = """ hello\d """; and the compiler with complain that \d is an illegal escape character.Auten
Escape Sequences in Text Blocks You can add various escape sequences to text blocks just as you would add them to your String literals. For instance, you can include new lines in your text blocks by placing the values on multiple lines or by using escape sequences such as \n. As expected, invalid escape sequences or unescaped backslashes are not allowed. \d is an illegal escape characterDestined
@josh waxman. Initial question didn't contain Regex - it was added later by some editor. Author of question asked about raw strings - just review history of questionDestined
yes, I know about newlines and other escape characters. Regarding the initial question, that is not the case - please review the history. The initial author of the question DID specifically ask about regexes, stating in the q body, "(I'm writing a fair amount of regex code and raw strings would make my code immensely more readable)". The edit you reference was adding "for regex in particular" to the q title, rather than the q body, for the sake of clarity. The questioner wanted to not escape \d as \\d.Auten
Meanwhile, this is something I too wanted, so I finally figured it out and just now posted my approach as a separate answer...Auten
I
5

Have the raw text file in your class path and read it in with getResourceAsStream(....)

Isomorphism answered 10/8, 2009 at 23:18 Comment(0)
E
4

( Properties files are common, but messy - I treat most regex as code, and keep it where I can refer to it, and you should too. As for the actual question: )

Yes, there are ways to get around the poor readability. You might try:

String s = "crazy escaped garbage"; //readable version//

though this requires care when updating. Eclipse has an option that lets you paste text in between quotes, and the escape sequences are applied for you. The tactic would be to edit the readable versions first, and then delete the garbage, and paste them in between the empty quotes "".


Idea time:

Hack your editor to convert them; release as a plugin. I checked around for plugins, but found none (try searching though). There's a one-to-one correspondence between escaped source strings and textbox text (discounting \n, \r\n). Perhaps highlighted text with two quotes on the ends could be used.

String s = "##########
#####";

where # is any character, which is highlighted - the break is treated as a newline. Text typed or pasted within the highlighted area are escaped in the 'real' source, and displayed as if they were not. (In the same way that Eclipse escapes pasted text, this would escape typed text, and also display it without the backslashes.) Delete one of the quotes to cause a syntax error if you want to edit normally. Hmm.

Exportation answered 11/8, 2009 at 1:19 Comment(0)
L
3

Note : As of today, not available. Probably I'll edit this answer again whenever the feature release.

There is an ongoing proposal to introduce Raw Strings in Java. They actually much useful in the cases of regex.

Example 1: A regular expression string that was coded as

  System.out.println("this".matches("\\w\\w\\w\\w"));

may be alternately coded as

System.out.println("this".matches(`\w\w\w\w`));

since backslashes are not interpreted as having special meaning.

Example2 : A multi lines String literal with foreign language appends.

A multiple line string that was coded as 
    String html = "<html>\n" +
                "    <body>\n" +
                "         <p>Hello World.</p>\n" +
                "    </body>\n" +
                "</html>\n";

may be alternately coded as

 String html = `<html>
                       <body>
                           <p>Hello World.</p>
                       </body>
                   </html>
                  `;

which avoids the need for intermediate quotes, concatenation and explicit newlines.

Hopefully we can expect the release soon.

Lily answered 23/2, 2018 at 9:49 Comment(3)
It looks like this might make it into Java 12: dzone.com/articles/…Hulbig
@Hulbig Hopefully, Java 12 will be mainstream before the human civilization dies out... or by the very least, before Python 2 dies out..... -_-Hectocotylus
Unfortunately, this did not get included in Java 12.Ironing
M
2

String#getBytes() exposes a copy of the internal byte array contained in every single String object which actually contains the 16-bit UTF-16 encoded String - the byte array will contain the same string converted to match the platform's default charset. What I'm saying is that I think this is as close to "raw" string as you can ever get in Java.

Mullite answered 10/8, 2009 at 19:27 Comment(3)
You should use getBytes() with the charsetName, the String may not have the same encoding as the platformHaversack
Any decent IDE has a property file editor which can handle all the nasty escaping. E.g. ElicpseDeflation
Rich Seller: According to javadocs it should match the platform default charset, however I wouldn't be surprised if it didn't.Mullite
C
1

You could write your own, non-escaped property reader and put your strings in a resource file.

Cinquefoil answered 10/8, 2009 at 22:20 Comment(0)
G
1

I personally consider regex strings data and not code, so I don't like them in my code--but I realize that's impractical and unpopular (Yes, I realize it, you don't have to yell at me).

Given that there is no native way to do this, I can come up with two possibilities (well, three but the third is, umm, unnatural).

So my personal preference would be to just parse a file into strings. You could name each entry in the file and load them all into a hash table for easy access from your code.

Second choice, create a file that will be pre-processed into a java interface; it could escape the regex as it does so. Personally I hate code generation, but if the java file is 100% never human edited, it's not too bad (the real evil is generated files that you are expected to edit!)

Third (tricky and probably a bad idea): You might be able to create a custom doclet that will extract strings from your comments into a text file or a header file at compile time, then use one of the other two methods above. This keeps your strings in the same file in which they are being used. This could be really hard to do correctly, and the penalties of failure are extreme, so I wouldn't even consider it unless I had an overwhelming need and some pretty impressive talent.

I only suggest this because comments are free-form and things within a "pre" tag are pretty safe from formatters and other system uglies. The doclet could extract this before printing the javadocs, and could even add some of the generated javadocs indicating your use of regex strings.

Before downvoting and telling me this is a stupid idea--I KNOW, I just thought I'd suggest it because it's interesting, but my preference as I stated above is a simple text file...

Grane answered 10/8, 2009 at 22:39 Comment(3)
Most regexs I have seen are definitely an integral part of the program that uses them and should not be seen as data. You do not want to externalise them any more or less than any other piece of logic in there, such as conditions in if statements.Speller
Actually, externalizing conditions is often good as well, that's a lot of what is behind closures. Aren't regexes usually tied to external data though? If so, you certainly want to be able to change them. I guess the point is that you SHOULD externalize everything you can, and the big advantage of regex is that you can.Grane
I'm with Thilo on this. Regexes usually define the kind of data specific code is looking for or for analyzing that data. If you externalize it, I have found it is easy for someone to change that without realizing the implications.Dakota
W
1

No. But there's an IntelliJ plug-in that makes this easier to deal with, called String Manipulation.

IntelliJ will also automatically escape a string pasted into it. (As @Dread points out, Eclipse has a plug-in to enable this.)

Walloping answered 16/7, 2014 at 4:50 Comment(0)
A
0

The question asks for something akin to raw strings specifically to support Regular Expressions, which typically have portions akin to escape characters. So, for instance, \d means digit in regex and one would need to write \\d in a Java string. Meanwhile, a slash literal in regex would be written as \\ so in Java would be written as \\\\ which makes for difficult to read code.

The answer about proposed raw strings in Java was the most promising, but alas the proposal was not accepted. The answer about Pattern.quote() is good for certain strings, where there is overlap, but will not handle cases like \d and \w which are not valid Java strings in the first place. The answer about multiline strings will also not help with most of the complex regex strings which bothered the original questioner, who was looking for cleaner Java regex code.

My answer is therefore the following awkwardness. The backslash is known in Unicode as the Reverse Solidus. (The forward slash is the regular solidus.) Unicode has several alternatives which look like, especially in certain code editors (such as IntelliJ IDEA). These include the Big Reverse Solidus, Small Reverse Solidus, and Set Minus. Thus, channeling the Pattern.quote() answer, we write the regex using an alternative such as Big Reverse Solidus and perform string substitution for an escaped regular backslash when using it. The Big Reverse Solidus is unlikely to be needed for other aspects of your regex.

Thus, we can write:

Pattern pattern = Pattern.compile("∖d+".replace('∖', '\\'));

You might even write the string replacement into a static method similar to Pattern.quote() to get nicer looking code.

Auten answered 20/2, 2022 at 14:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.