Regex - how to match everything except a particular pattern

Asked 4/3, 2009 at 18:37 Answered 15/1, 2015 at 16:11

177

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.

Legerdemain answered 4/3, 2009 at 18:37 Comment(1)

PCRE would be best for this: see Regex Pattern to Match, Excluding when… / Except between. I removed findstr tag since all answers here are not valid for the tag. – Quatrain 4/3, 2020 at 9:22

199

You could use a look-ahead assertion:

(?!999)\d{3}

This example matches three digits other than 999.

But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.

A compatible regular expression with basic syntax only would be:

[0-8]\d\d|\d[0-8]\d|\d\d[0-8]

This does also match any three digits sequence that is not 999.

Pronate answered 4/3, 2009 at 18:41 Comment(6)

Look-ahead is not standard regular expression syntax, it is a Perl extension, it will only work in Perl, PCRE (Perl-Compatible RegEx) or other non-standard implementations – Vigor 4/3, 2009 at 19:26

It may not be standard, but don't most modern languages support it? What language doesn't support look-aheads these days? – Hydromechanics 4/3, 2009 at 19:45

That’s true. But most regex flavors support this feature (see <regular-expressions.info/refflavors.html>). – Pronate 4/3, 2009 at 19:49

Turns out that the windows findstr function only supports pure DFA-style regex anyway, so I need to just do it all differently. You still get the answer, though. – Legerdemain 4/3, 2009 at 21:38

i think the last regex would also not match 009, 019... etc – Lacefield 26/9, 2013 at 10:3

Standard Lex for C does not use PCREs :-( – Dozier 2/2, 2015 at 21:54

If you want to match a word A in a string and not to match a word B. For example: If you have a text:

1. I have a two pets - dog and a cat
2. I have a pet - dog

If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:

^(?=.*?\bdog\b)((?!cat).)*$

It will find only second line:

2. I have a pet - dog

Vestiary answered 21/2, 2013 at 11:26 Comment(2)

He failed mention it in the question, but the OP is actually using the DOS findstr command. It affords only a tiny subset of the capabilities you expect to find in a regex tool; lookahead is not among them. (I just added the findstr tag myself.) – Nomography 21/2, 2013 at 13:42

hm, yes, I found now in one of his comments on the posts. I saw Regex in the title. Anyways, if somebody finds this post when searching for the same for regular expression, like I did, maybe it could be helpful to someone :) thanks for comments – Vestiary 21/2, 2013 at 13:59

Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.

Leannleanna answered 4/3, 2009 at 18:48 Comment(3)

Then I just end up with (~A or B) instead of (A and ~B). It doesn't solve my problem. – Legerdemain 4/3, 2009 at 21:6

Pseudo-code: String toTest; if (toTest.matches(A) AND !toTest.matches(B)) { ... } – Leannleanna 4/3, 2009 at 21:54

I should have been more clear - the pieces are not fully independent. If A matches part of the string, then we care if ~B matches the rest of it (but not necessarily the whole thing). This was for the windows command-line findstr function, which i found is restricted to true regexs, so moot point. – Legerdemain 4/3, 2009 at 22:7

notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

I'm faced with a situation where I have to match an (A and ~B) pattern.

The basic regex for this is frighteningly simple: B|(A)

You just ignore the overall matches and examine the Group 1 captures, which will contain A.

An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag

The regex: <a.*?<\/a>|(\d+)

Demo (look at Group 1 in the lower right pane)

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Composer answered 13/5, 2014 at 21:51 Comment(1)

This sounds too good to be true! Unfortunately, this solution is not universal and it fails in Emacs, even after replacing \d with [[:digit:]]. The first reference mentions it is specific to Perl and PHP: "There is a variation using syntax specific to Perl and PHP that accomplishes the same." – Outer 24/10, 2018 at 12:43

The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.

Vigor answered 4/3, 2009 at 19:11 Comment(4)

If I were dealing with actual regex's then this would all be moot. Regex now seems to refer to the nebulous CSG-ish (?) space of pattern matching that most langauges support. Since I need to match (A and ~B), there's no way to remove the negation and still do it all in one step. – Legerdemain 4/3, 2009 at 21:48

Lookahead, as described above, would have done it if findstr did anything beyond true DFA regexs. The whole thing is sort of odd and I don't know why I have to do this command-line (batch now) style. It's just another example of my hands being tied. – Legerdemain 4/3, 2009 at 21:53

@notnot: You are using findstr from Windows? Then you just need /v. Like: findstr A inputfile | findstr /v B > outputfile.txt The first matches all lines with A, the second matches all lines that doesn't have B. – Vigor 4/3, 2009 at 22:4

Thanks! That's actually exactly what I needed. I didn't ask the question that way, though, so I still giving the answer to Gumbo for the more generalized answer. – Legerdemain 5/3, 2009 at 17:16

pattern - re

str.split(/re/g)

will return everything except the pattern.

Test here

Stour answered 5/3, 2009 at 2:26 Comment(2)

You probably want to mention that you need to join then again. – Daphne 26/3, 2012 at 14:7

A similar approach is using replace str.replace(/re/g, ''), then there's no need to rejoin them. also if you throw in a nice trailing \s? like str.replace(/\re\s?/g, '') then you get rid of any duplicate spaces you would have from something being replaced in the middle of a string – Lallygag 22/1, 2014 at 6:28

(B)|(A)

then use what group 2 captures...

Besetting answered 5/3, 2009 at 2:29 Comment(1)

He needs to capture not B, he aim is not to just ignore all the B patterns. – Leela 16/7, 2013 at 6:30

My answer here might solve your problem as well:

https://mcmap.net/q/25873/-regex-replace-everything-except-a-particular-pattern

Instead of Replace, you would use Match.
Instead of group $1, you would read group $2.
Group $2 was made non-capturing there, which you would avoid.

Example:

Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");

The first capturing group specifies the pattern that you wish to avoid. The last capturing group captures everything else. Simply read out that group, $2.

Deceit answered 15/1, 2015 at 16:11 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags