A regex to match a substring that isn't followed by a certain other substring
Asked Answered
D

5

142

I need a regex that will match blahfooblah but not blahfoobarblah

I want it to match only foo and everything around foo, as long as it isn't followed by bar.

I tried using this: foo.*(?<!bar) which is fairly close, but it matches blahfoobarblah. The negative look behind needs to match anything and not just bar.

The specific language I'm using is Clojure which uses Java regexes under the hood.

EDIT: More specifically, I also need it to pass blahfooblahfoobarblah but not blahfoobarblahblah.

Daze answered 13/4, 2010 at 15:48 Comment(1)
Did you try using foo.*(?<!bar.*) ?Equivalency
T
194

Try:

/(?!.*bar)(?=.*foo)^(\w+)$/

Tests:

blahfooblah            # pass
blahfooblahbarfail     # fail
somethingfoo           # pass
shouldbarfooshouldfail # fail
barfoofail             # fail

Regular expression explanation

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    bar                      'bar'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    foo                      'foo'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Other regex

If you only want to exclude bar when it is directly after foo, you can use

/(?!.*foobar)(?=.*foo)^(\w+)$/

Edit

You made an update to your question to make it specific.

/(?=.*foo(?!bar))^(\w+)$/

New tests

fooshouldbarpass               # pass
butnotfoobarfail               # fail
fooshouldpassevenwithfoobar    # pass
nofuuhere                      # fail

New explanation

(?=.*foo(?!bar)) ensures a foo is found but is not followed directly bar

Toponymy answered 13/4, 2010 at 15:51 Comment(7)
This is very close, and a very good answer. I knew I wouldn't be specific enough. :( I need this: "blahfoomeowwoof/foobar/" to pass because of the lonely "foo", but not this blahfoobarmeowwoof If this is possible.Daze
As a side question, how would one go about matching something like "bot" but not "botters"?Daze
Yes. I can use what I have now, but it would be easier if I could just match bot but not botters. I'm very sorry. I'm inexperienced with regexes, and I'm afraid I'm slowly figuring out what I want myself. :pDaze
@Rayne, this is the same question. In your above example, you wanted to match foo but not foobar. To match bot but not botters, you would use /(?=.*bot(?!ters))^(\w+)$/.Quillet
Well, I was generally aiming towards whole words. Like I said, I'm confused about what I really want and what is really possible. Doing it like this will work. Thank you for time. :)Daze
I'm using this code for a similar regex (?=.*did you(?!say)) that looks for instances of the string "did you" so long as it isn't followed by "say". It currently matches "did you think" but not "did you" in isolation. Any thoughts on what the problem is here?Cartierbresson
what is the forward slash / at the beginning and end of the RegEx for?Fitted
O
64

To match a foo following by something that doesn't start with bar, try

foo(?!bar)

Your version with negative lookbehind is effectively "match a foo followed by something that doesn't end in bar". The .* matches all of barblah, and the (?<!bar) looks back at lah and checks that it doesn't match bar, which it doesn't, so the whole pattern matches.

Orison answered 13/4, 2010 at 16:0 Comment(1)
So I tried this for a regex that's designed to match the string "did you" so long as it isn't followed by "say". It works when differentiating between "did you say" and "did you think", for example, but just "did you" by itself doesn't get captured, and it should. Any suggestions?Cartierbresson
S
2

Use a negative look ahead instead:

\s*(?!\w*(bar)\w*)\w*(foo)\w*\s*

This worked for me, hope it helps. Good luck!

Savarin answered 13/4, 2010 at 15:59 Comment(1)
Simple yet effective regex, which also works for excluding repeating strings ("foofoo"). Perfect!Mammillary
T
1

You wrote a comment suggesting you like this to work matching all words in a string rather than the whole string itself.

Rather than mashing all of this in a comment, I'm posting it as a new answer.

New Regex

/(?=\w*foo(?!bar))(\w+)/

Sample text

foowithbar fooevenwithfoobar notfoobar foohere notfoobarhere butfooisokherebar notfoobarhere andnofuu needsfoo

Matches

foowithbar fooevenwithfoobar foohere butfooisokherebar needsfoo

Toponymy answered 13/4, 2010 at 17:23 Comment(0)
N
0

Your specific match request can be matched by:

\w+foo(?!bar)\w+

This will match blahfooblahfoobarblah but not blahfoobarblahblah.

The problem with your regex of foo.*(?<!bar) is the .* after foo. It matches as many of any characters including characters after bar.

Novara answered 13/4, 2010 at 16:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.