I know that I can negate group of chars as in [^bar]
but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual bar
, and not "any chars in bar"?
A great way to do this is to use negative lookahead:
^(?!.*bar).*$
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
grep
utility? –
Sofiasofie grep -v bar
:) –
Copalite foo
will match, bar
won't, but foobar
or barfoo
won't too! –
Whilst ^(?!bar).*$
work? It's technically saying if it doesn't contain 'bar' right? why does it require the .* I have checked and it actually doesn't, can anyone explain or break it down for me. –
Motherland ^
also applies within the lookahead. So you are saying you want all strings that don't start with bar
. This means you will match all strings without bar AND all strings which have bar EXCEPT those that START with bar
. That's not desired by OP. –
Social ^(?!.*bar).*$
as "Match any string--it must NOT start with "any characters followed by 'bar'" --but it can have any other set of characters". The "must NOT start with ..." bit is ^(?!.*bar)
. The "can have any other..." bit is the final '.*$' –
Doublet $
at the end - I think it can be dropped. So a slightly improved regex should be ^(?!.*bar).*
. Can a regex guru validate this and update the answer please? –
Doublet Unless performance is of utmost concern, it's often easier just to run your results through a second pass, skipping those that match the words you want to negate.
Regular expressions usually mean you're doing scripting or some sort of low-performance task anyway, so find a solution that is easy to read, easy to understand and easy to maintain.
Solution:
^(?!.*STRING1|.*STRING2|.*STRING3).*$
xxxxxx OK
xxxSTRING1xxx KO (is whether it is desired)
xxxSTRING2xxx KO (is whether it is desired)
xxxSTRING3xxx KO (is whether it is desired)
OK
KO
is result of test –
Maryrosemarys You could either use a negative look-ahead or look-behind:
^(?!.*?bar).*
^(.(?<!bar))*?$
Or use just basics:
^(?:[^b]+|b(?:$|[^a]|a(?:$|[^r])))*$
These all match anything that does not contain bar
.
(.(?<!bar))*?
(?<!bar)
is a negative lookbehind, isn't it? It follows the pattern (?<!a)b
, that would mean: wherever you find a b
, make sure there isn't an a
before it. Only that in this case, b
is empty for us; so it would mean: wherever you find anything, make sure there isn't a bar
before it. But how does it work the (.<negative lookbehind>)*?
? Why do you need the .
and the last ?
there? Many thanks! –
Laraelaraine ^(?!bar).*
not work ? –
Bridgettebridgewater The following regex will do what you want (as long as negative lookbehinds and lookaheads are supported), matching things properly; the only problem is that it matches individual characters (i.e. each match is a single character rather than all characters between two consecutive "bar"s), possibly resulting in a potential for high overhead if you're working with very long strings.
b(?!ar)|(?<!b)a|a(?!r)|(?<!ba)r|[^bar]
/(?:(?!bar).)*/g
on foobar
returns foo
AND ar
. –
Eaddy I came across this forum thread while trying to identify a regex for the following English statement:
Given an input string, match everything unless this input string is exactly 'bar'; for example I want to match 'barrier' and 'disbar' as well as 'foo'.
Here's the regex I came up with
^(bar.+|(?!bar).*)$
My English translation of the regex is "match the string if it starts with 'bar' and it has at least one other character, or if the string does not start with 'bar'.
^(?!bar$).*
matches the same as this (everything except exactly bar
) and avoids repetition. –
Padraig The accepted answer is nice but is really a work-around for the lack of a simple sub-expression negation operator in regexes. This is why grep --invert-match
exits. So in *nixes, you can accomplish the desired result using pipes and a second regex.
grep 'something I want' | grep --invert-match 'but not these ones'
Still a workaround, but maybe easier to remember.
invert match
option in R. Is it restricted to unix grep? –
Pre Extracted from this comment by bkDJ:
^(?!bar$).*
The nice property of this solution is that it's possible to clearly negate (exclude) multiple words:
^(?!bar$|foo$|banana$).*
.*
? –
Brandtr $
, too: ^(?!(bar|foo|banana)$).*
:-) –
Coraliecoraline .*
, it doesn't work. You can check here. –
Lyckman If it's truly a word, bar
that you don't want to match, then:
^(?!.*\bbar\b).*$
The above will match any string that does not contain bar
that is on a word boundary, that is to say, separated from non-word characters. However, the period/dot (.
) used in the above pattern will not match newline characters unless the correct regex flag is used:
^(?s)(?!.*\bbar\b).*$
Alternatively:
^(?!.*\bbar\b)[\s\S]*$
Instead of using any special flag, we are looking for any character that is either white space or non-white space. That should cover every character.
But what if we would like to match words that might contain bar
, but just not the specific word bar
?
(?!\bbar\b)\b\[A-Za-z-]*bar[a-z-]*\b
(?!\bbar\b)
Assert that the next input is notbar
on a word boundary.\b\[A-Za-z-]*bar[a-z-]*\b
Matches any word on a word boundary that containsbar
.
I wish to complement the accepted answer and contribute to the discussion with my late answer.
@ChrisVanOpstal shared this regex tutorial which is a great resource for learning regex.
However, it was really time consuming to read through.
I made a cheatsheet for mnemonic convenience.
This reference is based on the braces []
, ()
, and {}
leading each class, and I find it easy to recall.
Regex = {
'single_character': ['[]', '.', {'negate':'^'}],
'capturing_group' : ['()', '|', '\\', 'backreferences and named group'],
'repetition' : ['{}', '*', '+', '?', 'greedy v.s. lazy'],
'anchor' : ['^', '\b', '$'],
'non_printable' : ['\n', '\t', '\r', '\f', '\v'],
'shorthand' : ['\d', '\w', '\s'],
}
Just thought of something else that could be done. It's very different from my first answer, as it doesn't use regular expressions, so I decided to make a second answer post.
Use your language of choice's split()
method equivalent on the string with the word to negate as the argument for what to split on. An example using Python:
>>> text = 'barbarasdbarbar 1234egb ar bar32 sdfbaraadf'
>>> text.split('bar')
['', '', 'asd', '', ' 1234egb ar ', '32 sdf', 'aadf']
The nice thing about doing it this way, in Python at least (I don't remember if the functionality would be the same in, say, Visual Basic or Java), is that it lets you know indirectly when "bar" was repeated in the string due to the fact that the empty strings between "bar"s are included in the list of results (though the empty string at the beginning is due to there being a "bar" at the beginning of the string). If you don't want that, you can simply remove the empty strings from the list.
I had a list of file names, and I wanted to exclude certain ones, with this sort of behavior (Ruby):
files = [
'mydir/states.rb', # don't match these
'countries.rb',
'mydir/states_bkp.rb', # match these
'mydir/city_states.rb'
]
excluded = ['states', 'countries']
# set my_rgx here
result = WankyAPI.filter(files, my_rgx) # I didn't write WankyAPI...
assert result == ['mydir/city_states.rb', 'mydir/states_bkp.rb']
Here's my solution:
excluded_rgx = excluded.map{|e| e+'\.'}.join('|')
my_rgx = /(^|\/)((?!#{excluded_rgx})[^\.\/]*)\.rb$/
My assumptions for this application:
- The string to be excluded is at the beginning of the input, or immediately following a slash.
- The permitted strings end with
.rb
. - Permitted filenames don't have a
.
character before the.rb
.
© 2022 - 2024 — McMap. All rights reserved.