I'm attempting something which I feel should be fairly obvious to me but it's not. I'm trying to match a string which does NOT contain a specific sequence of characters. I've tried using [^ab]
, [^(ab)]
, etc. to match strings containing no 'a's or 'b's, or only 'a's or only 'b's or 'ba' but not match on 'ab'. The examples I gave won't match 'ab' it's true but they also won't match 'a' alone and I need them to. Is there some simple way to do this?
Use negative lookahead (cf. Regexr.com explanation):
^(?!.*ab).*$
UPDATE: In the comments below, I stated that this approach is slower than the one given in Peter's answer. I've run some tests since then, and found that it's really slightly faster. However, the reason to prefer this technique over the other is not speed, but simplicity.
The other technique, described here as a tempered greedy token, is suitable for more complex problems, like matching delimited text where the delimiters consist of multiple characters (like HTML, as Luke commented below). For the problem described in the question, it's overkill.
For anyone who's interested, I tested with a large chunk of Lorem Ipsum text, counting the number of lines that don't contain the word "quo". These are the regexes I used:
(?m)^(?!.*\bquo\b).+$
(?m)^(?:(?!\bquo\b).)+$
Whether I search for matches in the whole text, or break it up into lines and match them individually, the anchored lookahead consistently outperforms the floating one.
<td> .. </td>
elements that contained windows
somewhere between the start and end tags and not match the TD elements that didn't. I used <td(?:(?!</td>).)+</td>
to find the whole TD element where <td(?!.*</td>).*</td>
wouldn't work. Final regex was <td(?:(?!</td>).)+windows.*?</td>
. For a good example of "breaking the knowledge into smaller chunks" see below where the explanation of the regex characters used is included in the answer. –
Cade AAA
and ends with BBB
, that does not contain any other instances of AAA
or BBB
, but does contain CCC
(where AAA
, BBB
and CCC
are arbitrary multi-character sequences). This question is about matching a whole string that does not contain AAA
. Peter's approach works here too, but this approach is just as valid, and a little more intuitive. –
Bonus windows
part, i first needed to find one complete <td>...</td>
tag, which involved, as you say, finding the middle section "that does not contain any instances of aaa
or bbb
" (this part pretty closely matches the question). I was just adding my experience to try to help anyone else who arrives at this page. –
Cade <code> // <comment>
) except I wanted to ignore matches where the comment was at the start of the line; I only wanted to move the comment up to its own line. Thanks to you I got it complete in short order. –
Derangement (?:(?:(?!.*ab).*).)*
is going to be best –
Celloidin .*
needed in ^(?!.*ab).*$
? In my use case it seems to work without it as well (I use it as (?!\d+).+
) –
Rhinoscopy ^(?:[^a]*a[^b])*[^a]*$
. Lookahead is expensive. Note this is hard to generalize. Read as "match not 'a' until an 'a' is seen, then match not 'b'; do logic repeatedly then finally match any string of not 'a' until the end" –
Ajay Using a character class such as [^ab]
will match a single character that is not within the set of characters. (With the ^
being the negating part).
To match a string which does not contain the multi-character sequence ab
, you want to use a negative lookahead:
^(?:(?!ab).)+$
And the above expression disected in regex comment mode is:
(?x) # enable regex comment mode
^ # match start of line/string
(?: # begin non-capturing group
(?! # begin negative lookahead
ab # literal text sequence ab
) # end negative lookahead
. # any single character
) # end non-capturing group
+ # repeat previous match one or more times
$ # match end of line/string
^((?!ab).+)$
. –
Svetlanasvoboda .
from the "any single character" is only for the same line. If you need to do this to multi-line regex, you may need to replace it to (.|\n)
–
Heliotropin Use negative lookahead (cf. Regexr.com explanation):
^(?!.*ab).*$
UPDATE: In the comments below, I stated that this approach is slower than the one given in Peter's answer. I've run some tests since then, and found that it's really slightly faster. However, the reason to prefer this technique over the other is not speed, but simplicity.
The other technique, described here as a tempered greedy token, is suitable for more complex problems, like matching delimited text where the delimiters consist of multiple characters (like HTML, as Luke commented below). For the problem described in the question, it's overkill.
For anyone who's interested, I tested with a large chunk of Lorem Ipsum text, counting the number of lines that don't contain the word "quo". These are the regexes I used:
(?m)^(?!.*\bquo\b).+$
(?m)^(?:(?!\bquo\b).)+$
Whether I search for matches in the whole text, or break it up into lines and match them individually, the anchored lookahead consistently outperforms the floating one.
<td> .. </td>
elements that contained windows
somewhere between the start and end tags and not match the TD elements that didn't. I used <td(?:(?!</td>).)+</td>
to find the whole TD element where <td(?!.*</td>).*</td>
wouldn't work. Final regex was <td(?:(?!</td>).)+windows.*?</td>
. For a good example of "breaking the knowledge into smaller chunks" see below where the explanation of the regex characters used is included in the answer. –
Cade AAA
and ends with BBB
, that does not contain any other instances of AAA
or BBB
, but does contain CCC
(where AAA
, BBB
and CCC
are arbitrary multi-character sequences). This question is about matching a whole string that does not contain AAA
. Peter's approach works here too, but this approach is just as valid, and a little more intuitive. –
Bonus windows
part, i first needed to find one complete <td>...</td>
tag, which involved, as you say, finding the middle section "that does not contain any instances of aaa
or bbb
" (this part pretty closely matches the question). I was just adding my experience to try to help anyone else who arrives at this page. –
Cade <code> // <comment>
) except I wanted to ignore matches where the comment was at the start of the line; I only wanted to move the comment up to its own line. Thanks to you I got it complete in short order. –
Derangement (?:(?:(?!.*ab).*).)*
is going to be best –
Celloidin .*
needed in ^(?!.*ab).*$
? In my use case it seems to work without it as well (I use it as (?!\d+).+
) –
Rhinoscopy ^(?:[^a]*a[^b])*[^a]*$
. Lookahead is expensive. Note this is hard to generalize. Read as "match not 'a' until an 'a' is seen, then match not 'b'; do logic repeatedly then finally match any string of not 'a' until the end" –
Ajay Yes its called negative lookahead. It goes like this - (?!regex here)
. So abc(?!def)
will match abc not followed by def. So it'll match abce, abc, abck, etc.
Similarly there is positive lookahead - (?=regex here)
. So abc(?=def)
will match abc followed by def.
There are also negative and positive lookbehind - (?<!regex here)
and (?<=regex here)
respectively
One point to note is that the negative lookahead is zero-width. That is, it does not count as having taken any space.
So it may look like a(?=b)c
will match "abc" but it won't. It will match 'a', then the positive lookahead with 'b' but it won't move forward into the string. Then it will try to match the 'c' with 'b' which won't work. Similarly ^a(?=b)b$
will match 'ab' and not 'abb' because the lookarounds are zero-width (in most regex implementations).
More information on this page
?!
did ignore that group, but it kept matching the rest of the string. Is there a way to use that for making the whole line excluded if it has a specific pattern? –
Hyperopia abc(?!def) will match abc not followed by def. So it'll match abce, abc, abck, etc. what if I want neither def nor xyz will it be abc(?!(def)(xyz)) ???
I had the same question and found a solution:
abc(?:(?!def))(?:(?!xyz))
These non-counting groups are combined by "AND", so it this should do the trick. Hope it helps.
abc(?:(?!def)(?!xyz))
would do. They're in the con-capturing group already. No need to put another one inside it. They're also not "combined by "AND"". They're checked one at a time, just like ab
is first checked for a
, then for b
, but lookaheads just don't move the cursor along. –
Parsec Using a regex as you described is the simple way (as far as I am aware). If you want a range you could use [^a-f].
Simplest way is to pull the negation out of the regular expression entirely:
if (!userName.matches("^([Ss]ys)?admin$")) { ... }
Just search for "ab" in the string then negate the result:
!/ab/.test("bamboo"); // true
!/ab/.test("baobab"); // false
It seems easier and should be faster too.
The regex [^ab] will match for example 'ab ab ab ab' but not 'ab', because it will match on the string ' a' or 'b '.
What language/scenario do you have? Can you subtract results from the original set, and just match ab?
If you are using GNU grep, and are parsing input, use the '-v' flag to invert your results, returning all non-matches. Other regex tools also have a 'return nonmatch' function, too.
If I understand correctly, you want everything except for those items which contain 'ab' anywhere.
[^ab]
is a character class that matches everything except a's and b's. Obviously it will match the spaces. –
Parsec In this case I might just simply avoid regular expressions altogether and go with something like:
if (StringToTest.IndexOf("ab") < 0)
//do stuff
This is likely also going to be much faster (a quick test vs regexes above showed this method to take about 25% of the time of the regex method). In general, if I know the exact string I'm looking for, I've found regexes are overkill. Since you know you don't want "ab", it's a simple matter to test if the string contains that string, without using regex.
© 2022 - 2024 — McMap. All rights reserved.