What is regularity?
Asked Answered
S

5

6

This is more of a computer science question than a programming one, but I figure that this is the best place out of all the related sites to ask this.

When I discovered Regular Expressions and looked up the term I assumed that this property of "regularity" refers to the fact that the expression's language has a definable structural pattern. However, in reading about the subject and the theory behind this I learned that there are kinds of languages that are not regular, and yet from the way they are defined it's clear that a pattern can be matched to them. One such language is (a^n)(b^n). Clearly this is a pattern, and yet this is not a regular language. So now I'm left wondering what is it about regular languages that makes them regular, and this language not?

Sporogonium answered 9/1, 2010 at 1:50 Comment(2)
the product of a fibre filled diet?Delp
You would know, Mitch Wheat.Subtreasury
F
4

The etymology of the name comes from Kleene's 1950s work describing regular sets using his mathematical notation created for the purpose. See this.

Fontenot answered 9/1, 2010 at 1:56 Comment(1)
@Barry Kelly: thanks for the typo fix. I had meant to go back and check the word.Fontenot
E
10

Intuitively explaining computer science is... tricky. I'll give it a shot, but keep in mind that some of this is going to be "close enough" but not theoretically rigorous.

A regular language is one that can be decided by a machine that is computational equivalent to a finite automata (DFA/NDFA). A finite automata can be thought of as a machine that operates purely in states, no storage. So you can see that anbn cannot be regular as it requires a machine that can count the number of a's and b's (and thus must have infinite* storage capacity) in order to compare them.

For comparison, (abc)n is regular, because the number of repetitions is irrelevant.

For a more rigorous (and correspondingly denser view) check the wikipedia article and linked pages.

*The infinite doesn't matter here, but I mention it for completeness. It might be easier to think of it as "luckily, always just enough" storage.

Equanimous answered 9/1, 2010 at 2:5 Comment(2)
+1 for the "states, no storage" comment, I forgot to mention that.Cervantez
I find its easiest to think like so: DFA/regular -> no storage, PDA/CFL -> infinite storage w/ restricted access, TM -> infinite storage w/ random accessEquanimous
F
4

The etymology of the name comes from Kleene's 1950s work describing regular sets using his mathematical notation created for the purpose. See this.

Fontenot answered 9/1, 2010 at 1:56 Comment(1)
@Barry Kelly: thanks for the typo fix. I had meant to go back and check the word.Fontenot
C
1

Perhaps the Wikipedia article on regular languages can explain it better than we can. However, I'll give it a shot.

From a theoretical standpoint, a regular language (set of strings) is one that can be generated using a finite state automaton. In programmer terms, this is equivalent to saying it can be generated using regular expressions. Thus, all finite languages (sets of strings) are regular, but there are some infinite languages, such as anbn (the language of all strings of n a's followed by n b's) that cannot be recognized using a FSA or regular expressions. There are more powerful computational devices (such as modern computers, which are modeled using Turing Machines) which can recognize those languages.

The reason regular expressions are used so much in programming for string searching is that they can recognize the large majority of strings that are important to us programmers, and at the same time can be implemented to search very quickly using finite state automata.

Cervantez answered 9/1, 2010 at 2:4 Comment(9)
Wrong. Programmers' regular expressions are usually not the way to define regular languages. RegExps are more generic (as they can recognize all the regular languages and many other languages).Timbered
What? Give me an example of a language which can be recognized by programmers' regexps but not theoretical regular expressions.Cervantez
Not all regexp are regex. Some languages implement truly regular regexp instead of a clone of Perl's regex.Asa
See the 4th bullet on en.wikipedia.org/wiki/Regular_language . If you want more details, you should take and study any good book on formal language theory.Timbered
@BlueRaja: Perl's /(.*)\1/ to match repeated words cannot be recognized by a truly regular language.Asa
Oh, a simple example: the language defined by (.*)\1, it is not regular, although I've just defined it through a regular expression. The problem is the back reference.Timbered
@slebetman: Are those non-regular expressions Perl-only? I've done regex in a number of languages, and I don't recognize that syntax.Cervantez
@BlueRaja: those are extensions to the concept of "Regular Expression" that are implemented by some programming languages (including Perl)Timbered
Not sure why the -1 - though there are (apparently) extensions to regex to make it more powerful, this answer is still technically correct..Cervantez
A
0

The word regular in regular expression refers to the Mathematical concept of regular, not the English concept. Just like how the word prime in mathematics bear little relation to prime beef.

It's inherited by CS (which is a branch of mathematics) to refer to a more specific concept: http://en.wikipedia.org/wiki/Regular_language

Asa answered 9/1, 2010 at 2:3 Comment(0)
P
0

regular expression are not really regular, the name is etymological.

Prophylactic answered 9/1, 2010 at 2:3 Comment(1)
Regexp IS regular but regex is not. Specifically, regex is what Perl calls its regexp-like syntax to distinguish it from traditional regexp. There are languages out there that still implement truly regular regexp: tcl and awk to name two.Asa

© 2022 - 2024 — McMap. All rights reserved.