Can a regular expression itself be parsed with a regular expression? [duplicate]
Asked Answered
C

1

8

I am reading the code of a regular expression parser, and start to wonder if the syntax of regular expression is itself regular, and can be expressed with another (quite complicated) regular expression?

rere = "" # the regular expression of regular language
match1 = re.match(rere, "[a-z]+@[a-z]+.com") # True
match2 = re.match(rere, ")az[") # False 

I don't see any recursive structure in regular expression syntax, so I think maybe this is doable?

If it is, what does the expression look like? If not, why?

Chingchinghai answered 3/9, 2015 at 6:50 Comment(6)
No. You need context-free grammar to parse regular expression. Nested parentheses can't be parsed with (theoretical) regular expression.Sarson
Yes, nested parentheses. I forgot about that. But if I don't support group inside group, would the answer be different?Chingchinghai
@NeoWang: Then what you have is weaker than regular expression. i.e. there are languages where regular expression/regular grammar can described, but not your grammar.Sarson
Actually, you can match nested parentheses with regex, but only in some regex flavors. Your example code is Python, and its regex engine does not support recursive behavior/balanced constructs. However, there is no magical regex that will "parse them all".Lafrance
@stribizhev: Those flavor are not strictly "regular" in the theoretical sense, but if the question specifically refers to real world "regex" engines, then I guess it's possible for some flavors.Sarson
This may be off topic... Why are most regex parsers hand written? Why not write its context free grammar and use a parser generator?Chingchinghai
C
5

You cannot parse nested parentheses with a regular expression because you would need infinite state to do so. So the answer is no. What you are looking for is called context-free grammars.

Cartoon answered 3/9, 2015 at 6:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.