Regex - how to match everything except a particular pattern
Asked Answered
L

8

177

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.

Legerdemain answered 4/3, 2009 at 18:37 Comment(1)
PCRE would be best for this: see Regex Pattern to Match, Excluding when… / Except between. I removed findstr tag since all answers here are not valid for the tag.Quatrain
P
199

You could use a look-ahead assertion:

(?!999)\d{3}

This example matches three digits other than 999.


But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.

A compatible regular expression with basic syntax only would be:

[0-8]\d\d|\d[0-8]\d|\d\d[0-8]

This does also match any three digits sequence that is not 999.

Pronate answered 4/3, 2009 at 18:41 Comment(6)
Look-ahead is not standard regular expression syntax, it is a Perl extension, it will only work in Perl, PCRE (Perl-Compatible RegEx) or other non-standard implementationsVigor
It may not be standard, but don't most modern languages support it? What language doesn't support look-aheads these days?Hydromechanics
That’s true. But most regex flavors support this feature (see <regular-expressions.info/refflavors.html>).Pronate
Turns out that the windows findstr function only supports pure DFA-style regex anyway, so I need to just do it all differently. You still get the answer, though.Legerdemain
i think the last regex would also not match 009, 019... etcLacefield
Standard Lex for C does not use PCREs :-(Dozier
V
32

If you want to match a word A in a string and not to match a word B. For example: If you have a text:

1. I have a two pets - dog and a cat
2. I have a pet - dog

If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:

^(?=.*?\bdog\b)((?!cat).)*$

It will find only second line:

2. I have a pet - dog
Vestiary answered 21/2, 2013 at 11:26 Comment(2)
He failed mention it in the question, but the OP is actually using the DOS findstr command. It affords only a tiny subset of the capabilities you expect to find in a regex tool; lookahead is not among them. (I just added the findstr tag myself.)Nomography
hm, yes, I found now in one of his comments on the posts. I saw Regex in the title. Anyways, if somebody finds this post when searching for the same for regular expression, like I did, maybe it could be helpful to someone :) thanks for commentsVestiary
L
15

Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.

Leannleanna answered 4/3, 2009 at 18:48 Comment(3)
Then I just end up with (~A or B) instead of (A and ~B). It doesn't solve my problem.Legerdemain
Pseudo-code: String toTest; if (toTest.matches(A) AND !toTest.matches(B)) { ... }Leannleanna
I should have been more clear - the pieces are not fully independent. If A matches part of the string, then we care if ~B matches the rest of it (but not necessarily the whole thing). This was for the windows command-line findstr function, which i found is restricted to true regexs, so moot point.Legerdemain
C
8

notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

I'm faced with a situation where I have to match an (A and ~B) pattern.

The basic regex for this is frighteningly simple: B|(A)

You just ignore the overall matches and examine the Group 1 captures, which will contain A.

An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag

The regex: <a.*?<\/a>|(\d+)

Demo (look at Group 1 in the lower right pane)

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Composer answered 13/5, 2014 at 21:51 Comment(1)
This sounds too good to be true! Unfortunately, this solution is not universal and it fails in Emacs, even after replacing \d with [[:digit:]]. The first reference mentions it is specific to Perl and PHP: "There is a variation using syntax specific to Perl and PHP that accomplishes the same."Outer
V
4

The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.

Vigor answered 4/3, 2009 at 19:11 Comment(4)
If I were dealing with actual regex's then this would all be moot. Regex now seems to refer to the nebulous CSG-ish (?) space of pattern matching that most langauges support. Since I need to match (A and ~B), there's no way to remove the negation and still do it all in one step.Legerdemain
Lookahead, as described above, would have done it if findstr did anything beyond true DFA regexs. The whole thing is sort of odd and I don't know why I have to do this command-line (batch now) style. It's just another example of my hands being tied.Legerdemain
@notnot: You are using findstr from Windows? Then you just need /v. Like: findstr A inputfile | findstr /v B > outputfile.txt The first matches all lines with A, the second matches all lines that doesn't have B.Vigor
Thanks! That's actually exactly what I needed. I didn't ask the question that way, though, so I still giving the answer to Gumbo for the more generalized answer.Legerdemain
S
2

pattern - re

str.split(/re/g) 

will return everything except the pattern.

Test here

Stour answered 5/3, 2009 at 2:26 Comment(2)
You probably want to mention that you need to join then again.Daphne
A similar approach is using replace str.replace(/re/g, ''), then there's no need to rejoin them. also if you throw in a nice trailing \s? like str.replace(/\re\s?/g, '') then you get rid of any duplicate spaces you would have from something being replaced in the middle of a stringLallygag
B
0
(B)|(A)

then use what group 2 captures...

Besetting answered 5/3, 2009 at 2:29 Comment(1)
He needs to capture not B, he aim is not to just ignore all the B patterns.Leela
D
0

My answer here might solve your problem as well:

https://mcmap.net/q/25873/-regex-replace-everything-except-a-particular-pattern

  • Instead of Replace, you would use Match.
  • Instead of group $1, you would read group $2.
  • Group $2 was made non-capturing there, which you would avoid.

Example:

Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");

The first capturing group specifies the pattern that you wish to avoid. The last capturing group captures everything else. Simply read out that group, $2.

Deceit answered 15/1, 2015 at 16:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.