Are regular expressions (regex) really regular?
Asked Answered
D

2

6

I understand how regular expressions got their name, and have read the related question (Why are regular expressions called "regular" expressions?), but am still wondering whether regular expressions are always regular.

For example, how can back-references be regular? Would that not require some memory and thus be impossible to match/generate by a finite state automaton?

Dematerialize answered 29/3, 2016 at 11:34 Comment(2)
The link in the answer to the question you reference states (in wikipedia), as opposed to many regular expressions engines provided by modern programming languages, which are augmented with features that allow recognition of languages that cannot be expressed by a classic regular expression. I interpret that as the evolution of regex moved it away from it's original idea of expressing regular languages.Elute
@ClasG: Thank you. It also provides the link for a paragraph that exactly answers my question: en.wikipedia.org/wiki/….Natica
E
4

The link in the answer to the question you reference states (in wikipedia), as opposed to many regular expressions engines provided by modern programming languages, which are augmented with features that allow recognition of languages that cannot be expressed by a classic regular expression.

So I would say that the evolution of regex moved it away from it's original idea of expressing regular languages.

From the Wikipedia article on regular expressions:

Many features found in virtually all modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (backreferences). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki", called squares in formal language theory. The pattern for these strings is (.+)\1.

Elute answered 29/3, 2016 at 12:1 Comment(0)
S
3

Modern extensions including back-references make the regex systems not a candidate of regular languages, however IMO they can be lifted to context-free languages but not to Turing machines.

Regular grammars share a common property called pumping lemma. You can check the example here which proves 0n1n is not a regular grammar (that is quite similar to back-references). Here is how it can be shown that back-references don't satisfy pumping lemma property.

  • Pumping lemma in current context: to show that a regex system is regular grammar, there needs to be a finite length p such that all strings that match the regex and have length equals to or greater than p can be split up in three substrings xyz such that y is not an empty string and all the strings represented by xy*z (y pumped for [0, infinite) times) match with the regex.

  • If we can show no such p can satisfy the conditions for the regex then it is not in regular grammar.

  • For back-references we will need to have two of these pumping strings that too of equal length, one for the sub-pattern in captured group and one in the backreference. This is exactly what the push-down automata or context free languages are. There is also a pumping lemma for context free grammars that is based on splitting into uvwxy where v and x can be pumped equally n times. We can show that the regex with back-references system satisfies this lemma.

Situla answered 29/3, 2016 at 12:18 Comment(2)
"however IMO they can be lifted to context-free languages but not to Turing machines" << this is false. 1) you are confusing formal languages with abstract machines. 2) every context free languages are recursive and/or recursively enumerable, so there is a Turing machine that recognizes it. 3) backreferences are not even context-free: see en.wikipedia.org/wiki/…Fun
Thanks for the link. I would like to defend some of the points as a constructive exercise. Each point separately. 1) The context-free languages and push down automata abstract machine are exactly the same in terms of power, similarly are the regular grammars and finite automata. I don't see a problem in using them interchangeably. 2) True, but context-free grammars are not turing complete, this is what I meant. 3) I still think a single back reference might be addressed by pumping lemma for CFG however I understand that claim needs some backing. I should add the link that you provided. :)Situla

© 2022 - 2024 — McMap. All rights reserved.