Do all regular expressions halt?
Asked Answered
S

8

14

Is there any regular expression that will, for some input string, search for a match forever?

Schematism answered 6/8, 2009 at 20:28 Comment(4)
... and can you write a program that determines whether or not a regex will halt for a given input?Mclane
For bonus marks - using a regex !Osteoma
Sure, mmyers and mgb - just run this against the input concatenated to the regex: /.*/ - match means it halts, no match means it doesn't. :PGules
mmyers: the program is quite simple. I will present it in python: True. --- As for accepting a string, well, that's "run regex r on input i, return what it returns."Swarm
A
34

For a finite input, there is no formal regular expression that will not halt.

Any formal regular expression can be translated into a Deterministic Finite Automata. A DFA reads the input one character at a time, and, at the end of the input, you are either in an accepting state or in a non-accepting state. If the state is accepting, then the input matches the regular expression. Otherwise, it doesn't.

Now, most "regular expression" libraries support things which are not regular expressions, such as back references. As long as you keep away from those features, and have a finite input, you are guaranteed a halt. If you don't... depending on exactly what you are using, you might well not be guaranteed a halt. Perl allows arbitrary code to be inserted, for instance, and arbitrary, turing-machine equivalent code is not guaranteed to halt.

Now, if the input is infinite, then trivial regular expressions can be found which will never halt. For example, ".*".

Astronomical answered 6/8, 2009 at 20:48 Comment(2)
The only quibble: they are called deterministic finite automata, not definite. To contrast with (ironically, equivelant) non-deterministic finite automata.Swarm
@Agor: I hate it when I do that. I'm well aware of the correct name, but I always type the wrong name for some reasons. :-(Astronomical
G
7

Formal regex is actually a method of describing a deterministic finite automaton for parsing strings. The regex "matches" if the DFA winds up in an accepting state at the end of input. Since the DFA reads its input sequentially, it will always halt when it reaches the end of the input, and whether or not there is a match is merely a matter of examining which state of the DFA it halts at.

Substring matching is effectively the same, except instead of being forced to halt at the end of one read-through of the string, the DFA would instead be forced to halt after reading each possible substring once - still a finite case. (Yes, most regex engines implement this in a bit more optimized manner than just throwing every possible substring at a DFA - but conceptually it the limit is still there).

Thus the only possible case in which the DFA would not halt is if the input were infinite, which is generally considered beyond the scope of the halting problem.

Gules answered 6/8, 2009 at 20:46 Comment(0)
R
3

I imagine, it is not possible to find a regular expression that doesn't halt.

The size of your input is finite. The maximum size of any matched subgroup of the regular expression is, at max, the size of your input.

Unless the algorithm being used is pretty stupid (going over cases multiple times), the number of matched subgroups, will too, be finite.

So, it will halt.

Regent answered 6/8, 2009 at 20:38 Comment(0)
R
1

Not in the sense you are describing, you can have some very inefficient regular expressions that take up loads of resources and end up killing the regex engine, this is not the same as halting.

I don't think halting really applies here, as the other commenters of this post have so astutely pointed out. http://en.wikipedia.org/wiki/Halting_problem

Rojas answered 6/8, 2009 at 20:32 Comment(3)
There isn't a way to make a program that, for every possible program will tell you if it halts or not. But that doesn't mean you can't do that for a subset. Maybe regexes are one such subset, but I don't know.Tergiversate
Referring to the halting problem here is not very useful; the algorithm used for RE matching is a particular algorithm, the interesting thing about the halting problem is to solve it for all program-input pairs.Regent
yeah, that's what I was trying to say in my answer.Rojas
T
1

According to this question, every regular expression halts.

Tergiversate answered 6/8, 2009 at 20:38 Comment(0)
W
0

I can't imagine an input string that would be parsed forever, although an infinitely long string would be parsed forever. Given that a regular expression can describe a regular language, which is potentially an infinite set of words, then a regex can describe a language of infinite words, including words of infinite length. However, no input string can be infinitely long, so at some point it would have to halt.

For example, if a*b is accepted in the language, and you have an infinitely long string of 'a's, then yes, the regex would never halt. Practically, though, this is impossible.

Walrus answered 6/8, 2009 at 20:41 Comment(0)
L
0

Yes.

A regular expression can be represented by a finite state machine. Every time you receive an atomic input, it will cause any well-defined FSM to transition to a known state.

The exception is when you have infinite input, but this is not applicable to the halting problem because it deals with finite input. When you have a finite state machine and finite input, it is always possible to determine whether your machine will halt or not.

http://en.wikipedia.org/wiki/Finite_state_machine

Laciniate answered 6/8, 2009 at 21:0 Comment(0)
M
0

+1 for Daniel's answer: all finite inputs cause true regex's (i.e., without backreferences or other non-regex features) to halt, and regex's are equivalent to DFAs.

Bonus: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

http://swtch.com/~rsc/regexp/regexp1.html

Note that the two graphs at the top of the article have different scales on the y-axis: one is seconds, the other is microseconds!

Motorway answered 13/5, 2010 at 17:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.