This is very much related to Regular Expression to match outer brackets however, I specifically want to know how or whether it's possible to do this regex's recursive pattern? I'm yet to find a python example using this strategy so think this ought to be a useful question!
I've seen some claims that recursive patterns can be used to match balanced parenthesis, but no examples using python's regex package (Note: re does not support recursive pattern, you need to use regex).
One claim is that syntax is b(?:m|(?R))*e
where:
b
is what begins the construct,m
is what can occur in the middle of the construct, ande
is what can occur at the end of the construct
I want to extract matches for the outer braces in the following:
"{1, {2, 3}} {4, 5}"
["1, {2, 3}", "4, 5"] # desired
Note that this is easy to do the same for inner braces:
re.findall(r"{([^{}]*)}", "{1, {2, 3}} {4, 5}")
['2, 3', '4, 5']
(In my example I was using finditer (over match objects), see here.)
So I had hoped that the following, or some variation, would work:
regex.findall(r"{(:[^{}]*|?R)}", "{1, {2, 3}} {4, 5}")
regex.findall(r"({(:[^{}]*|?R)})", "{1, {2, 3}} {4, 5}")
regex.findall(r"({(:.*|(?R))*})", "{1, {2, 3}} {4, 5}")
regex.findall(r"({(:.*)|(?R)*})", "{1, {2, 3}} {4, 5}")
regex.findall(r"({(:[^{}])|(?R)})", "{1, {2, 3}} {4, 5}")
but I'm scuppered by either [] or error: too much backtracking
.
Is it possible to extract match objects for the outer parenthesis using regex's recursion?
Obviously, I run the risk of being shot down with:
- don't parse html with regex
- do this with pyparse
- write a proper lexer & parser e.g. using ply
I want to emphasis this is about how to use the recursive pattern (which if my understanding is correct, takes us outside of regular language parsing, so may can actually be possible!). If it can be done, this ought to be a cleaner solution.
(?R)
andb(?:m|(?R))*e
is a great trick that I had never seen so plainly spelled out :) – Nitrobacteria