How to negate specific word in regex? [duplicate]
Asked Answered
R

12

844

I know that I can negate group of chars as in [^bar] but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual bar, and not "any chars in bar"?

Resh answered 6/8, 2009 at 17:20 Comment(2)
Related: regex for matching something if it is not preceded by something elseMcclurg
Something like thisLyckman
A
1022

A great way to do this is to use negative lookahead:

^(?!.*bar).*$

The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].

Asserted answered 6/8, 2009 at 17:38 Comment(18)
This says it all (I probably would have started with (?!bar) and built up). I don't see why other people are making it so complicated.Burlington
line start character at the beginning does a pretty good job.Jive
I don't think light weight regex parsers like SLRE support ! operator yet.Celisse
Nicely done - matches a line that has the specified string and the string is not preceded by anything and the string is followed by anything.This is by definition the absence of the string! because if present it will always be preceded by something even if its a line anchor ^Frederique
Is there a version of this that works in the Linux command line grep utility?Sofiasofie
@NeilTraft how about grep -v bar :)Copalite
If you are using grep then use -P option. -P enables perl regex. e.g. grep -P '(?!do not contain this string)'Rambouillet
this worked "just right" with the extra info provided by @sgrillon's answerCumulate
I want not allow to user to write "Password", "password" or any other exact word.Dunois
Unfortunately, this doesn't works with actual words. foo will match, bar won't, but foobar or barfoo won't too!Whilst
This is super useful for an idempotent ansible replaceNiigata
@Whilst That is correct and expected as those other three contain the "bar" so they shouldn't match. Foo is the only word of those three you gave that doesn't have the "bar"Motherland
this is exactly what i needed .. but I'm curious why doesn't ^(?!bar).*$ work? It's technically saying if it doesn't contain 'bar' right? why does it require the .* I have checked and it actually doesn't, can anyone explain or break it down for me.Motherland
this solution does not work in R.Pre
@carilynchin its because ^ also applies within the lookahead. So you are saying you want all strings that don't start with bar. This means you will match all strings without bar AND all strings which have bar EXCEPT those that START with bar. That's not desired by OP.Social
@Frederique - I didn't get your drift, but I suspect it's incorrect.Doublet
Instead, I read ^(?!.*bar).*$ as "Match any string--it must NOT start with "any characters followed by 'bar'" --but it can have any other set of characters". The "must NOT start with ..." bit is ^(?!.*bar). The "can have any other..." bit is the final '.*$'Doublet
Thinking about my text explanation above, I do not see a need for the final $ at the end - I think it can be dropped. So a slightly improved regex should be ^(?!.*bar).*. Can a regex guru validate this and update the answer please?Doublet
C
75

Unless performance is of utmost concern, it's often easier just to run your results through a second pass, skipping those that match the words you want to negate.

Regular expressions usually mean you're doing scripting or some sort of low-performance task anyway, so find a solution that is easy to read, easy to understand and easy to maintain.

Chronogram answered 6/8, 2009 at 17:33 Comment(3)
There are lots of situations where you don't control the workflow: you just get to write a single regexp which is a filter.Sackman
And if you want to replace all Texts which don't match a certain regex?Gorrono
It special idea, but it does work. Most of the answers are for PCRE, but It can't apply their solution to re2Sammiesammons
W
70

Solution:

^(?!.*STRING1|.*STRING2|.*STRING3).*$

xxxxxx OK

xxxSTRING1xxx KO (is whether it is desired)

xxxSTRING2xxx KO (is whether it is desired)

xxxSTRING3xxx KO (is whether it is desired)

Wellgrounded answered 13/9, 2016 at 16:8 Comment(3)
thanks, this gave me the extra info i needed for multiple wordsCumulate
Am I the only one who hates "OK" and "KO" as indicators of passing a test? It's just one typo away from disaster...Easting
@AJPerez, Yes OK KO is result of testMaryrosemarys
D
57

You could either use a negative look-ahead or look-behind:

^(?!.*?bar).*
^(.(?<!bar))*?$

Or use just basics:

^(?:[^b]+|b(?:$|[^a]|a(?:$|[^r])))*$

These all match anything that does not contain bar.

Duce answered 6/8, 2009 at 17:24 Comment(8)
What languages don't support (negative) look-behinds and/or (negative) look-aheads in regex?Generalissimo
I think the point being made is, looking at your pattern it's not at all clear that all you're doing is rejecting the word "bar".Chronogram
@Bryan: And, in fact, it doesn't reject the word "bar". It just rejects "b" when followed by "ar".Generalissimo
Good idea, but not supported everywhere. Afaik Javascript supports negative look-ahead, but not look-behind. I don't know details about other languages, but this can be helpful: en.wikipedia.org/wiki/Comparison_of_regular_expression_enginesBrayton
@Generalissimo bash doesn't support negative look-behind/look-ahead.Dwightdwindle
@Generalissimo look-aheads and look-behinds are not posixGreenway
Can you explain the second solution? (.(?<!bar))*? (?<!bar) is a negative lookbehind, isn't it? It follows the pattern (?<!a)b, that would mean: wherever you find a b, make sure there isn't an a before it. Only that in this case, b is empty for us; so it would mean: wherever you find anything, make sure there isn't a bar before it. But how does it work the (.<negative lookbehind>)*?? Why do you need the . and the last ? there? Many thanks!Laraelaraine
` ^(?!.*?bar).* ` Why did you use lazy here ? Why does just ^(?!bar).* not work ?Bridgettebridgewater
G
44

The following regex will do what you want (as long as negative lookbehinds and lookaheads are supported), matching things properly; the only problem is that it matches individual characters (i.e. each match is a single character rather than all characters between two consecutive "bar"s), possibly resulting in a potential for high overhead if you're working with very long strings.

b(?!ar)|(?<!b)a|a(?!r)|(?<!ba)r|[^bar]
Generalissimo answered 6/8, 2009 at 17:20 Comment(4)
Instead of those multiple updates which force us to read the wrong answers before getting to your final answer, why not rewrite your answer to be complete, but without the somewhat confusing bad parts? If somebody really cares about the edit history they can use the built-in features of this site.Chronogram
Been two and a half years since I wrote this answer, but sure.Generalissimo
damn that hurts, try this (?:(?!bar).)*Chryso
@Mary, This won't work as expected. For example /(?:(?!bar).)*/g on foobar returns foo AND ar.Eaddy
C
36

I came across this forum thread while trying to identify a regex for the following English statement:

Given an input string, match everything unless this input string is exactly 'bar'; for example I want to match 'barrier' and 'disbar' as well as 'foo'.

Here's the regex I came up with

^(bar.+|(?!bar).*)$

My English translation of the regex is "match the string if it starts with 'bar' and it has at least one other character, or if the string does not start with 'bar'.

Courtney answered 10/9, 2010 at 20:44 Comment(3)
@ReReqest - you will have much better chance to have this question answered if you post it as a separate question. In that you can provide link back to this question if you want. For the substance of question - it looks OK but I'm no regex guruResh
That was the one I was looking for. It really matches everything except bar.Aulos
^(?!bar$).* matches the same as this (everything except exactly bar) and avoids repetition.Padraig
E
23

The accepted answer is nice but is really a work-around for the lack of a simple sub-expression negation operator in regexes. This is why grep --invert-match exits. So in *nixes, you can accomplish the desired result using pipes and a second regex.

grep 'something I want' | grep --invert-match 'but not these ones'

Still a workaround, but maybe easier to remember.

Eclogue answered 4/1, 2016 at 0:4 Comment(3)
This is the right answer for someone using grep, which certainly qualifies as regex. I just wish this answer were more prominent (even included in the accepted answer) so that I hadn't spent time with the other answers first.Kurtzman
I cant see the invert match option in R. Is it restricted to unix grep?Pre
I use a GUI-based grep like TextCrawler. But if you are not using Windows OS, not sure what to use.Pullover
H
11

Extracted from this comment by bkDJ:

^(?!bar$).*

The nice property of this solution is that it's possible to clearly negate (exclude) multiple words:

^(?!bar$|foo$|banana$).*
Hoppe answered 10/5, 2019 at 10:18 Comment(4)
why do you need trailing .*?Brandtr
Because the negative lookahead doesn't match any characters.Knecht
Seems to work by extracting the $, too: ^(?!(bar|foo|banana)$).* :-)Coraliecoraline
@SashaBond without .*, it doesn't work. You can check here.Lyckman
S
9

If it's truly a word, bar that you don't want to match, then:

^(?!.*\bbar\b).*$

The above will match any string that does not contain bar that is on a word boundary, that is to say, separated from non-word characters. However, the period/dot (.) used in the above pattern will not match newline characters unless the correct regex flag is used:

^(?s)(?!.*\bbar\b).*$

Alternatively:

^(?!.*\bbar\b)[\s\S]*$

Instead of using any special flag, we are looking for any character that is either white space or non-white space. That should cover every character.

But what if we would like to match words that might contain bar, but just not the specific word bar?

(?!\bbar\b)\b\[A-Za-z-]*bar[a-z-]*\b
  1. (?!\bbar\b) Assert that the next input is not bar on a word boundary.
  2. \b\[A-Za-z-]*bar[a-z-]*\b Matches any word on a word boundary that contains bar.

See Regex Demo

Shavonneshaw answered 17/2, 2020 at 13:40 Comment(0)
W
4

I wish to complement the accepted answer and contribute to the discussion with my late answer.

@ChrisVanOpstal shared this regex tutorial which is a great resource for learning regex.

However, it was really time consuming to read through.

I made a cheatsheet for mnemonic convenience.

This reference is based on the braces [], (), and {} leading each class, and I find it easy to recall.

Regex = {
 'single_character': ['[]', '.', {'negate':'^'}],
 'capturing_group' : ['()', '|', '\\', 'backreferences and named group'],
 'repetition'      : ['{}', '*', '+', '?', 'greedy v.s. lazy'],
 'anchor'          : ['^', '\b', '$'],
 'non_printable'   : ['\n', '\t', '\r', '\f', '\v'],
 'shorthand'       : ['\d', '\w', '\s'],
 }
Wigley answered 6/12, 2017 at 6:32 Comment(0)
G
1

Just thought of something else that could be done. It's very different from my first answer, as it doesn't use regular expressions, so I decided to make a second answer post.

Use your language of choice's split() method equivalent on the string with the word to negate as the argument for what to split on. An example using Python:

>>> text = 'barbarasdbarbar 1234egb ar bar32 sdfbaraadf'
>>> text.split('bar')
['', '', 'asd', '', ' 1234egb ar ', '32 sdf', 'aadf']

The nice thing about doing it this way, in Python at least (I don't remember if the functionality would be the same in, say, Visual Basic or Java), is that it lets you know indirectly when "bar" was repeated in the string due to the fact that the empty strings between "bar"s are included in the list of results (though the empty string at the beginning is due to there being a "bar" at the beginning of the string). If you don't want that, you can simply remove the empty strings from the list.

Generalissimo answered 7/8, 2009 at 19:58 Comment(1)
@Ajk_P yes but this kind of answers may help the OP think outside the box, they could've been fixated on regexes not realizing that it could be solved without them.Funds
C
0

I had a list of file names, and I wanted to exclude certain ones, with this sort of behavior (Ruby):

files = [
  'mydir/states.rb',      # don't match these
  'countries.rb',
  'mydir/states_bkp.rb',  # match these
  'mydir/city_states.rb' 
]
excluded = ['states', 'countries']

# set my_rgx here

result = WankyAPI.filter(files, my_rgx)  # I didn't write WankyAPI...
assert result == ['mydir/city_states.rb', 'mydir/states_bkp.rb']

Here's my solution:

excluded_rgx = excluded.map{|e| e+'\.'}.join('|')
my_rgx = /(^|\/)((?!#{excluded_rgx})[^\.\/]*)\.rb$/

My assumptions for this application:

  • The string to be excluded is at the beginning of the input, or immediately following a slash.
  • The permitted strings end with .rb.
  • Permitted filenames don't have a . character before the .rb.
Contrarious answered 6/11, 2015 at 11:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.