Is there a way to negate a regular expression?

Asked 11/3, 2013 at 11:30 Answered 11/3, 2013 at 20:42

Given a regular expression R that describes a regular language (no fancy backreferences). Is there an algorithmic way to construct a regular expression R* that describes the language of all words except those described by R? It should be possible as Wikipedia says:

The regular languages are closed under the various operations, that is, if the languages K and L are regular, so is the result of the following operations: […] the complement ¬L

For example, given the alphabet {a,b,c}, the inverse of the language (abc*)+ is (a|(ac|b|c).*)?

As DPenner has already pointed out in the comments, the inverse of a regular expresion can be exponentially larger than the original expression. This makes inversing regular expressions unsuitable to implement negative partial expression syntax for searching purposes. Is there an algorithm that preserves the O(n*m) runtime characteristic (where n is the size of the regex and m is the length of the input) of regular expression matching and allows for negated subexpressions?

Windfall answered 11/3, 2013 at 11:30 Comment(33)

Construct the NFA from regex, construct the DFA from NFA (there should be an algo for this), turn the non-terminal states into terminal states and vice-versa, then derive the regex from DFA. – Eyelet 11/3, 2013 at 12:1

@Eyelet That sounds like an interesting idea; the problem is, that a NFA of length O(n) may correspond to a DFA of length O(2^n). – Windfall 11/3, 2013 at 14:11

I didn't go through proper study on this topic, but I can't see how an NFA would generate a bigger DFA. Can you show me one such case? – Eyelet 11/3, 2013 at 16:54

@Eyelet Each state of the DFA associated with an NFA is a reachable member of the powerset of the NFA. This makes an upper bound of 2^n states where n is the number of states of the NFA. It is possible to reach this by constructing a regular language where each subset of NFA states can be reached. There was a simple example somewhere but I just can't find it. – Windfall 11/3, 2013 at 17:43

See the relevant question on theoretical CS stack overflow. NFA->DFA->inversion->regex is worst-case optimal. – Submiss 11/3, 2013 at 19:6

Your Wikipedia quote doesn't match up with your description of the problem. The reverse of a language is the set of all strings from that language reversed. i.e. if L = {ab, abc, aba}, the reverse, L^R = {ba, cba, aba}. Regardless though, its still true that the negation (or more technically the complement) is still regular. – Conductive 11/3, 2013 at 19:6

@DPenner I am sorry. I may have misread that article. – Windfall 11/3, 2013 at 19:10

@FUZxxl That's alright, formal languages can be confusing! As you rightly point out though, the standard algorithm for negating a regular expression is O(2^n) in space, and would be impractical for languages with a large amount of states in its NFA. – Conductive 11/3, 2013 at 19:16

This is certainly an interesting question, but is this really a programming question? Wouldn't this be a better fit for ComputerScience? (Especially since you seem uninterested in working, if not mathematically pure, solutions?) – Guillen 11/3, 2013 at 19:41

@Cyborgx37 Let me quote the FAQ: We feel the best Stack Overflow questions have a bit of source code in them, but if your question generally covers … • a specific programming problem • a software algorithm – Windfall 11/3, 2013 at 19:43

It all boils down to a programming question. You can do a lot of funny things with plain (aka really regular) regular expresion that you can't do with "enhanced" regex, such as letting an untrusted user use them to search stuff, since they have a guaranteed linear runtime. All toolkits for plain regex I know don't support inverse matching, even though the could be converted to plain regex. It would be nice to know whether there is an algorithm that allows to implement a "match inverse" operator without breaking the linear time complexity. – Windfall 11/3, 2013 at 19:46

@Cyborgx37 “… programming related …”. ’nuff said. – Homophonous 11/3, 2013 at 19:48

@FUZxxl: For your second part, in practice, it will usually involve checking whether there is a match, then logically negate the result, or use negative look-ahead construct in the (ir)regular expression. – Eyelet 11/3, 2013 at 19:49

@Eyelet Look ahead is not part of plain regex. It might break linear complexity. – Windfall 11/3, 2013 at 19:50

@FUZxxl: Consider an extension of plain regex, where you can have plain regex, OR negative look-ahead assertion which can only contain plain regex inside (not even another assertion inside). I think the complexity will be the same. The complexity is only broken when you have a wild mix of plain regex and look-ahead. – Eyelet 11/3, 2013 at 19:56

@FUZxxl - Ok, I buy it. Ponder, though, that "you can't prove a negative". In most branches of science, it is most often the case that a question must be stated as a positive and then the answer given in terms of evidence to support it (or lack thereof). I believe you'll find the same is true with all but the most trivial regex. It is much easier to express what you are looking for, then negate the result. – Guillen 11/3, 2013 at 19:57

@Cyborgx37 As nhahtdh wrote in the very first comment, it is indeed easily (in the sense that the algorithm is simple) possible to compute the inverse of a regex. It's just not easy (in the sense that it has a high computational complexity). – Windfall 11/3, 2013 at 20:0

@Eyelet Wouldn't that requite backtracking? The key to a linear time implementation is to avoid backtracking as hell. – Windfall 11/3, 2013 at 20:2

The problem with my extension is that, while it is useful to negate some plain regex, it breaks the property where 2 regex can be freely concatenated, OR'ed together - which I think is the motivation for finding the plain regex which is the negation of another plain regex. – Eyelet 11/3, 2013 at 20:3

@FUZxxl: Assuming that you can match a string with linear complexity (or confirm that there is no match) with a plain regex, the negative look-ahead simply negate the result of matching. (If the assumption is wrong, then even plain regex will have trouble checking that there is no match). – Eyelet 11/3, 2013 at 20:5

@Eyelet I can't really imagine how this should work. Could ou post a more detailed sketch of the algorithm in an answer? – Windfall 11/3, 2013 at 20:14

@FUZxxl: It is not an algorithm or anything. Simply put, it is a form of checking whether there is a match, then logically negate the result (has match --> false, no match --> true). – Eyelet 11/3, 2013 at 20:19

@Eyelet This seems to fail when the negated expression is part of a larger expression. – Windfall 11/3, 2013 at 20:38

@FUZxxl: Of course it will fail (check my "formal" definition of the extension, and the caveat a few comments later) – Eyelet 11/3, 2013 at 20:47

Yes Complement of a regular language is regular and there is an algorithmic way possible to write a complement regular expression for a regular expression. Additionally as DPenner says about inefficiency. We can always good RE according to minimized DFA. So for an RE there we can evaluate RE that is capable to accept complement language. – Antepast 15/3, 2013 at 16:17

What do you mean: »We can always good RE according to minimized DFA« – Windfall 16/3, 2013 at 8:27

@FUZxxl no! that why I use word good instead of minimized. – Antepast 17/3, 2013 at 10:34

@GrijeshChauhan I still don't really get what you want to tell me. I am sorry. I have problems parsing your grammar. – Windfall 17/3, 2013 at 10:35

@FUZxxl OK, (1) Complement of Regular Language(RE) is regular (2) to find RE of complement language. do like RE-->NFA-->DFA-->minimized DFA --> COMPLEMENT DFA --> RE. (a) RE-->NFA is algorithmic LEX parser (b) NFA --> DFA is algorithm (c) DFA --> MINIMIZED DFA again algorithmic (both b and c school time assignments) (d) DFA --> COMPLEMENT DFA is algorithmic (by reading my answer I lined you can realized) (e) complement DFA-->RE is algorithm using ARDEN'S THEOREM. So What I mean to say for a given RE you can find RE for complement. – Antepast 17/3, 2013 at 10:46

@FUZxxl because for complement DFA we need to make complement DFA (not partial) That is reason we gets complicated RE for complement language. But We can do some reduction work by MINIMIZE complement DFA – Antepast 17/3, 2013 at 10:50

@Grijesh Thanks for summing up what all the others said before. The problem is that the DFA's size might be exponential to the size of the NFA, causing the complement regex to have a size exponential to the original regex. – Windfall 17/3, 2013 at 11:14

@FUZxxl Yes but you can always minimized NFA into DFA. Also What I means complete DFA always comes with more states. – Antepast 17/3, 2013 at 12:31

@GrijeshChauhan Tell me please how this reduces the length of the generated regular expression or the intermediate DFA. – Windfall 17/3, 2013 at 13:22

Unfortunately, the answer given by nhahdtdh in the comments is as good as we can do (so far). Whether a given regular expression generates all strings is PSPACE-complete. Since all problems in NP are in PSPACE-complete, an efficient solution to the universality problem would imply that P=NP.

If there were an efficient solution to your problem, would you be able to resolve the universality problem? Sure you would.

Use your efficient algorithm to generate a regular expression for the negation;
Determine whether the resulting regular expression generates the empty set.

Note that the problem "given a regular expression, does it generate the empty set" is fairly straightforward:

The regular expression {} generates the empty set.
(r + s) generates the empty set iff both r and s generate the empty set.
(rs) generates the empty set iff either r or s generates the empty set.
Nothing else generates the empty set.

Basically, it's pretty easy to tell whether a regular expression generates the empty set: just start evaluating the regular expression.

(Note that while the above procedure is efficient in terms of the output length, it might not be efficient in terms of the input length, if the output length is more than polynomially faster than the input length. However, if that were the case, we'd have the same result anyway, i.e., that your algorithm isn't really efficient, since it would take exponentially many steps to generate an exponentially longer output from a given input).

Leandroleaning answered 11/3, 2013 at 19:36 Comment(3)

Whether a given regular expression generates all strings - The problem description is quite vague here. I don't know what you are talking about. – Eyelet 11/3, 2013 at 19:39

Thank you for the answer. Could you, if you like, also investigate into the second part of my question? – Windfall 11/3, 2013 at 19:41

"Whether a given regular expression generates all strings is PSPACE-complete" I'm looking for a proof of that claim, do you have one? – Numbing 1/6, 2019 at 7:45

Wikipedia says: ... if there exists at least one regex that matches a particular set then there exist an infinite number of such expressions. We can deduct from this statement that there is an infinite number of expressions that describe the language of all words except those described by R.

Again, (as also @nhahtdh tried to explain) the simplest algorithm to address this question is to extend the scope of evaluation outside the context of the regular expression language itself. That is: match the strings you want to exclude (which represent a finite subset to work with) by using the original regular expression and then treat any failure to match as an actual match (out of an infinite set of other possibilities). So, if the result of the match is negative, your candidate strings are a subset of the valid solutions.

Persuasion answered 11/3, 2013 at 20:42 Comment(3)

then compare it with the finite set of candidates you provide Not really. Just try to match the string with the (plain) regex. If fail to match = a match. This only works if you want to particularly negate a whole plain regex. It cannot concatenate. – Eyelet 11/3, 2013 at 20:51

@nhahtdh, I tried to reformulate the answer. Anywho, I don't see a real application of the question. – Persuasion 11/3, 2013 at 21:7

Check this comment – Eyelet 11/3, 2013 at 21:9

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags