How to write patterns for use with re.VERBOSE when they contain meaningful whitespace?
Asked Answered
N

1

9

Regexes containing meaningful spaces break when re.VERBOSE is added, apparently because re.VERBOSE 'helpfully' magics away the (meaningful) whitespace inside 'Issue Summary', as well as all the crappy non-meaningful whitespace (e.g. padding and newlines inside a (multiline) pattern). (My use of re.VERBOSE with multiline is non-negotiable - this is actually a massive simplification of a huge multiline regex where re.VERBOSE is necessary just to stay sane.)

import re
re.match(r'''Issue Summary.*''', 'Issue Summary: fails''', re.U|re.VERBOSE)
# No match!
re.match(r'''Issue Summary.*''', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue Summary.*', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10b98ff38>

Is there a saner alternative to write re.VERBOSE-friendly patterns containing meaningful spaces, short of replacing each instance in my pattern with '\s' or '.', which is not just ugly but counter-intuitive and a pain to automate?

re.match(r'Issue\sSummary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue.Summary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10b98ff38>

(As an aside, this a useful docbug catch on Python 2 and 3. I'll file it once I get consensus here on what the right solution is)

Noonberg answered 17/11, 2017 at 0:28 Comment(5)
Why are you using all the triple quotes? They aren't helping, and they're obscuring your strings. r'''abc''' is just r'' + 'abc' + '', or 'abc'. The r isn't even taking effect since it ends after the initial empty string.Eustazio
@TomKarzes: as I stated clearly in the question "This is actually a massive simplification of a huge multiline regex". The real regex is actually 14 lines long and growing. It has multiple nested sub-expressions. So like I said, multiline pattern and re.VERBOSE are non-negotiable.Noonberg
Ok - it's just that by making them single-line, the meaning of the quotes has changed.Eustazio
@TomKarzes: ahh, you're right. Raw multiline string: r'''this is wrong''' . The right syntax must use r with double-quotes: r"""this is right""". See How to correctly write a raw multiline string in Python?. My misconception is due to other people having been spreading the same mistake for years. Related: Python regex compile (with re.VERBOSE) not workingNoonberg
@Noonberg No. The single and double quotes are completely interchangeable. And so are the single and double triple quotes, prefixed or not. See the language reference or this question.Sweltering
E
11

If re.VERBOSE is used, then I think there's no choice other than to change the regular expression string. However, I would suggest one of the following:

r'abc\ def'

or:

r'abc[ ]def'

Both r'\ ' and '[ ]' match a single space character (not any whitespace, only an actual space). Note that, without the r in front, the backslash character would need to be doubled, i.e. \\.

Eustazio answered 17/11, 2017 at 0:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.