I've seen a few comments here that mention that modern regular expressions go beyond what can be represented in a regular language. How is this so?
What features of modern regular expressions are not regular? Examples would be helpful.
I've seen a few comments here that mention that modern regular expressions go beyond what can be represented in a regular language. How is this so?
What features of modern regular expressions are not regular? Examples would be helpful.
The first thing that comes to mind is backreferences:
(\w*)\s\1
(matches a group of word characters, followed by a space character and then the same group previously matched) eg: hello hello
matches, hello world
doesn't.
This construct is not regular (ie: can't be generated by a regular grammar).
Another feature supported by Perl Compatible RegExp (PCRE) that is not regular are recursive patterns:
\((a*|(?R))*\)
This can be used to match any combination of balanced parentheses and "a"s (from wikipedia)
(.)x\1
defines a regular language: "axa", "bxb", etc. I believe it's only when combined with Kleene closures that backreferences make the language irregular. –
Cryptanalysis (.*)\1
will do. –
Aquilar .
matches a much larger range of characters than just \w*\s
–
Selfreliant A couple of examples:
/my (group)/.match("my group")[1]
will output "group". storing something in a group requires an external storage, which a finite automaton does not have.(?<MYGROUP>.)*
could perform multiple captures of "." in the same group.(?<MYGROUP>test)
pushes a stack, while (?<-MYGROUP>)
pops a stack.(?<FIRSTGROUP-LASTGROUP>)
which pops the LASTGROUP and pushes the capture since the LASTGROUP index on the FIRSTGROUP stack. This can actually be used to match infinitely nested constructions which is definitely beyond the power of a finite automaton.Probably other good examples exist :-) If you are further interessted in some of the implementation details of external stacks in combination with Regex's and balanced grouping and thus higher order automata than finite automata, I once wrote two short articles on this (http://www.codeproject.com/KB/recipes/Nested_RegEx_explained.aspx and http://www.codeproject.com/KB/recipes/RegEx_Balanced_Grouping.aspx).
Anyway - finitieness or not - I blieve that the power that this extra stuff brings to the regular languages is great :-)
Br. Morten
A deterministic or nondeterministic finite automaton recognizes just the regular languages, which are described by regular expressions. The definition of a regular expression is simple. Let S be an alphabet. Then the empty set, the empty string, and every element of S are regular expressions (over S). Let u and v be regular expressions. Then the union (u | v), concatenation (uv), and closure (u*) of u and v are regular expressions over S. This definition is easily extended to the regular languages. No other expression is a regular expression. As pointed out, some back-references are an example. The Wikipedia pages on regular languages and expressions are good references.
In essence, certain "regular expressions" are not regular because no automaton of a particular type can be constructed to recognize them. For example, the the language
{ a^ i b^ i : i <= 0 }
is not regular. This is because the accepting automaton would require infinitely many states, but an automaton accepting regular languages must have a finite number of states.
a^i b^i
is certainly nonregular (it's a DCFG), but can we actually express this using the "regular expressions" of programming languages? –
Aquilar /^( |a(?1)b)$/
–
Erigena © 2022 - 2024 — McMap. All rights reserved.