How to write a regular expression to match a string literal where the escape is a doubling of the quote character?
Asked Answered
D

4

8

I am writing a parser using ply that needs to identify FORTRAN string literals. These are quoted with single quotes with the escape character being doubled single quotes. i.e.

'I don''t understand what you mean'

is a valid escaped FORTRAN string.

Ply takes input in regular expression. My attempt so far does not work and I don't understand why.

t_STRING_LITERAL = r"'[^('')]*'"

Any ideas?

Drastic answered 26/1, 2010 at 22:40 Comment(0)
P
20

A string literal is:

  1. An open single-quote, followed by:
  2. Any number of doubled-single-quotes and non-single-quotes, then
  3. A close single quote.

Thus, our regex is:

r"'(''|[^'])*'"
Phelps answered 26/1, 2010 at 22:45 Comment(1)
This doesn't seem to handle escape sequences.Alvinaalvine
I
4

You want something like this:

r"'([^']|'')*'"

This says that inside of the single quotes you can have either double quotes or a non-quote character.

The brackets define a character class, in which you list the characters that may or may not match. It doesn't allow anything more complicated than that, so trying to use parentheses and match a multiple-character sequence ('') doesn't work. Instead your [^('')] character class is equivalent to [^'()], i.e. it matches anything that's not a single quote or a left or right parenthesis.

Impervious answered 26/1, 2010 at 22:45 Comment(0)
P
1

It's usually easy to get something quick-and-dirty for parsing particular string literals that are giving you problems, but for a general solution you can get a very powerful and complete regex for string literals from the pyparsing module:

>>> import pyparsing
>>> pyparsing.quotedString.reString
'(?:"(?:[^"\\n\\r\\\\]|(?:"")|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*")|(?:\'(?:[^\'\\n\\r\\\\]|(?:\'\')|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*\')'

I'm not sure about significant differences between FORTRAN's string literals and Python's, but it's a handy reference if nothing else.

Pridemore answered 7/3, 2011 at 17:19 Comment(1)
This doesn't seem to handle f-strings.Chlordane
M
0
import re

ch ="'I don''t understand what you mean' and you' ?"

print re.search("'.*?'",ch).group()
print re.search("'.*?(?<!')'(?!')",ch).group()

result

'I don'
'I don''t understand what you mean'
Maiocco answered 7/3, 2011 at 18:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.