Are .NET's regular expressions Turing complete?
Asked Answered
T

4

12

Regular expressions are often pointed to as the classical example of a language that is not Turning complete. For example "regular expressions" is given in as the answer to this SO question looking for languages that are not Turing complete.

In my, perhaps somewhat basic, understanding of the notion of Turning completeness, this means that regular expressions cannot be used check for patterns that are "balanced". Balanced meaning have an equal number of opening characters as closing characters. This is because to do this would require you to have some kind of state, to allow you to match the opening and closing characters.

However the .NET implementation of regular expressions introduces the notion of a balanced group. This construct is designed to let you backtrack and see if a previous group was matched. This means that a .NET regular expressions:

^(?<p>a)*(?<-p>b)*(?(p)(?!))$

Could match a pattern that:

ab
aabb
aaabbb
aaaabbbb
... etc. ...

Does this means .NET's regular expressions are Turing complete? Or are there other things that are missing that would be required for the language to be Turing complete?

Telfore answered 29/1, 2011 at 6:52 Comment(7)
Yes, the definition of Turing Completeness ;-)Krebs
Okay, I got a bit over excited about this one, what it actually means they can match languages that are non-regular, which according to Wikipedia is not that uncommon for regular expression syntax. "'Regular expressions' [...] are only marginally related to real regular expression": en.wikipedia.org/wiki/…Telfore
.NET is very powerful indeed. It even has variable length lookbehind.Philanthropic
Variable length lookbehind and lookahead add no power to regular expressions.Undercarriage
@Porges - Care to explain? I can't prove there's anything you can't do without variable-length lookarounds, but it certainly seems that way. For example, testing other parts of the input string without causing matches to overlap.Georgie
@JustinMorgan: talking about the power of a language usually means considering what kind of an automaton you need to match the language against a string, and a regex with lookahead/lookbehind doesn't require anything more than what a normal regex requires - a DFA or NFA. You can always rewrite a regex with lookarounds into one without lookarounds, but it might grow a lot -- regexes with lookaround are more expressive than normal regexes, in that they allow you to say a lot more with a lot less. (Match extraction is another thing - you can't get the same results as "(?<=a)b" without them.)Undercarriage
@Porges - Great info, thanks for going into more detail. I've often wondered whether there are really patterns that are theoretically possible to describe with lookarounds, but impossible without them.Georgie
P
5

In computation theory, a regular expression describes a regular language. The class of regular languages is precisely those languages that can be recognized by some finite state machine, or generated by a regular grammar. However, the example you have described (balanced phrases) is not a regular language and cannot be recognized by a finite state machine or generated by a regular grammar. In fact, this a textbook example of what is called a context-free language. These require a push-down automata for recognition. The class of context-free languages is a superset of regular languages but a proper subset of turing complete languages. The syntax (as opposed to semantics) of most programming languages is a context-free language. If you are interested in learning more about this topic, you can start with the Chomsky hierarchy

Parturient answered 31/1, 2011 at 4:53 Comment(0)
T
5

Regex's in .NET are not Turing complete because they always halt. This cannot be said of a general turing machine.

Transmittance answered 17/12, 2011 at 9:50 Comment(3)
Hmmmmm… really? I did encounter some, which were executing forever.Accroach
I believe a reasonably stupid engine could even go infinite with a*+.Oneal
@SargeBorsch that sounds like a regex that has exponential cost in the size of the input. There are some well-known and rather simple patterns to cause this. The .NET regex engine is not able to deal with that (some other engines are). The required optimization for that was not implemented. I once looked into the engine internals and it seems like a screw-up to me based on the approach taken and the code.Transmittance
A
3

You pretty much miss the definition of turing complete.

Turing completeness, named after Alan Turing, is significant in that every plausible design for a computing device so far advanced can be emulated by a universal Turing machine — an observation that has become known as the Church-Turing thesis. Thus, a machine that can act as a universal Turing machine can, in principle, perform any calculation that any other programmable computer is capable of. However, this has nothing to do with the effort required to write a program for the machine, the time it may take for the machine to perform the calculation, or any abilities the machine may possess that are unrelated to computation.

Now, you can not do certain things in regular expressions, so the langauge is not turing complete.

You really have to use the same definition like everyone else, you know. Limited understanding should trigger finding out the truth.

Amourpropre answered 29/1, 2011 at 7:17 Comment(4)
Right. Not even remotely "turing complete". You can't even calculate 1+1 using a regex.Worshipful
Thank you kindly for your answer, your right I was confusing the counter example of why regex's aren't Turing complete with the actual definition of turning completeness. In my defence asking questions is the way you learn stuff.Telfore
@Robert, right. The only stupid question is the one that gets you divorced or thown in the desert naked and with no water.Worshipful
@Worshipful Actually, you can do addition if you're using a unitary system. You'd just have to remove /(1+)-\1/. Here's an interesting article that describes how to detect a prime number using regular expressions: iluxonchik.github.io/…Sheryl
B
3

@Inuyasha: Actually you can do addition with a regular expression. Well at least check if the computation is done correctly. Only thing is you have to give the input to the regex in a strange order (you can't reverse a string (or check if it is reversed) with a regex).

The pattern is:

abc
def
---
ghi

=> cfi beh adg

Suppose you want to add 1011 an 0110 in binary:

01011
00110
-----
10001


=> 101 110 010 100 001

If you give this input in the order of lease significant bits to greatest, interspersing the first operand, second operand, and the output, you would get the string 101110010100001. This can be matched by

((000|011|101)|(110(010|100|111)*001))*

Which is a garden variety regex. You could extend this to decimal addition but the regex would get crazy complicated.

Bigwig answered 16/3, 2011 at 3:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.