ANTLRv4: How to read double quote escaped double quotes in string?
Asked Answered
I

1

11

In ANTLR v4, how do we parse this kind of string with double quote escaped double quotes like in VBA?

for text:

"some string with ""john doe"" in it"

the goal would be to identify the string: some string with "john doe" in it

And is it possible to rewrite it to turn double double quotes in single double quotes? "" -> "?

Ibbison answered 27/7, 2013 at 12:9 Comment(0)
P
17

Like this:

STRING
 : '"' (~[\r\n"] | '""')* '"'
 ;

where ~[\r\n"] | '""' means:

~[\r\n"]    # any char other than '\r', '\n' and double quotes
|           # OR
'""'        # two successive double quotes

And is it possible to rewrite it to turn double double quotes in single double quotes?

Not without embedding custom code. In Java that could look like:

STRING
 : '"' (~[\r\n"] | '""')* '"' 
   {
     String s = getText();
     s = s.substring(1, s.length() - 1); // strip the leading and trailing quotes
     s = s.replace("\"\"", "\""); // replace all double quotes with single quotes
     setText(s);
   }
 ;
Pinard answered 27/7, 2013 at 12:47 Comment(6)
works like a charm. side question, is it possible to rewrite it to turn double double quotes in single double quotes? "" -> "? thanks!Ibbison
@Ibbison see my revised answer.Pinard
For me using ANTLR 4 the following rule worked with both double quoted and single quoted strings: STRING : '"' (~[\r\n"] | '""')* '"' | '\'' (~[\r\n\'] | '\'\'')* '\'';Inshore
@Inshore your string throws "invalid escape sequence \' Exception in thread "main" java.lang.RuntimeException: set is empty"Dendritic
@AlexeySh. I assume that's because I copy-pasted from a C# program and you pasted it into java program where escaping a double-quote is done differently (not through '\')Inshore
Different ANTLR versions might react differently for ~[\r\n\']. ANTLR 4.8 produces an error. It should be ~[\r\n']Pinard

© 2022 - 2024 — McMap. All rights reserved.