How to find a word NOT preceded by another specific word?

Asked 2/12, 2009 at 20:40 Answered 7/5, 2013 at 23:20

Which regular expression can I use to find all strings bar are not preceded by string foo? Having whitespace between the two is also illegal.

So the regex should match the following strings

foo is bar
hello bar

But not these

foobar
foo     bar

I've tried using the following

(?!<foo)bar

and it gets the work done for eliminating foobar, but I need to take care of the whitespace, and of course

(?!<foo)\s*bar

matches all the strings.

Thanks!

Penley answered 2/12, 2009 at 20:40 Comment(3)

"matches all the strings." - pedant mode: (?!<foo)\s*bar doesn't match 'foobar' – Qualm 2/12, 2009 at 20:54

You're right, thanks for pointing that out! I ended up using the following: preg_match('/(foo)?\s*bar/', haystack, matches); which will find the bar (whether preceeded by foo or not), and then a quick check on matches[] will identify if a foo was there or not. – Penley 3/12, 2009 at 0:34

The thing you are looking for is specifically called a zero-width negative look-behind assertion. Perl notably doesn't support variable-width look-behind (positive or negative), so things like \s* inside one of them won't work. Try using multiple match operators instead. – Deadman 30/12, 2009 at 5:25

Better to use other facilities of the programming language than to look too hard for a regex pattern.

You are looking for strings for which $s =~ /bar/ and not $s =~ /foo\s*bar/ is true.

The rest of the script below is just for testing.

#!/usr/bin/perl

use strict; use warnings;

my %strings = (
    'foo is bar'  => 1,
    'hello bar'   => 1,
    'foobar'      => 0,
    'foo     bar' => 0,
    'barbar'      => 1,
    'bar foo'     => 1,
    'foo foo'     => 0,
);

my @accept = grep { $strings{$_} } keys %strings;
my @reject = grep { not $strings{$_} } keys %strings;

for my $s ( @accept ) {
    if ( $s =~ /bar/ and not $s =~ /foo\s*bar/ ) {
        print "Good: $s\n";
    }
    else {
        print "Bad : $s\n";
    }
}

for my $s ( @reject ) {
    if ( $s =~ /bar/ and not $s =~ /foo\s*bar/ ) {
        print "Bad : $s\n";
    }
    else {
        print "Good: $s\n";
    }
}

Output:

E:\srv\unur> j
Good: bar foo
Good: hello bar
Good: foo is bar
Good: barbar
Good: foo foo
Good: foo     bar
Good: foobar

Revolution answered 2/12, 2009 at 20:57 Comment(4)

Won't that match even if the string does not contain 'bar'? – Qualm 2/12, 2009 at 21:26

'bar foobar' also makes an interesting test case. I'm not sure what the expected output is here though. – Qualm 2/12, 2009 at 22:30

Personally, when I find a pattern hard to match using regex; I need to go learn more regex, or get a refresher. I think that making an inflexible lookup table when it is not needed is no way to grow as a programmer. – Accomplished 4/12, 2009 at 14:18

Personally, I think you should read the code before downvoting. The look up table is there to list the test cases and make it easy to add test cases: The table has nothing to do with the logic. The logic consists entirely of $s =~ /bar/ and not $s =~ /foo\s*bar/. – Balmoral 4/12, 2009 at 14:30

Given a few test cases

my @match = (
  "foo is bar",
  "hello bar",
);

my @reject = (
  "foobar",
  "foo     bar",
);

you could of course do by feeding the results of one pattern to another:

my @control = grep !/foo\s*bar/, grep /bar/ => @match, @reject;

We can also do it with one:

my $nofoo = qr/
  (      [^f] |
    f  (?! o) |
    fo (?! o  \s* bar)
  )*
/x;

my $pattern = qr/^ $nofoo bar /x;

But don't take my word for it.

for (@match) {
  print +(/$pattern/ ? "PASS" : "FAIL"), ": $_\n";
}

for (@reject) {
  print +(/$pattern/ ? "FAIL" : "PASS"), ": $_\n";
}

Schist answered 2/12, 2009 at 20:51 Comment(2)

Impressive that you got this to work. Most likely "foo" and "bar" are just placeholders for much longer strings. It looks like your regular expressions are going to get extremely long for any real world examples. +1 for the different approach though. – Qualm 2/12, 2009 at 23:28

Thanks, and the sad news is that a literal pattern is the best case. I wonder what the limit of this approach is. It'd be nice for such tasks to have a regular-expression switch that complements the accept status of each NFA state. – Schist 3/12, 2009 at 19:54

  (?!<foo)\s*bar

This will match the whitespace

Seneschal answered 2/12, 2009 at 20:42 Comment(2)

Uh no. First, it's (?<!..) and second, the \s* needs to be inside the lookbehind or it will always match unless there is no whitespace between foo and bar. Mark Byers' got it right. – Lindbergh 2/12, 2009 at 21:43

sure sure all I knows is JA edited my answer, I feel blessed. – Seneschal 3/12, 2009 at 16:28

php:

!preg_match(/foo\s*bar/,$string) && preg_match(/bar/,$string)

perl:

$string !~ /foo\s*bar/ && $string =~ /bar/

Dandelion answered 2/12, 2009 at 20:46 Comment(4)

Ah, yes, because all of the strings technically can be found to have non-foo strings before bar... – Dandelion 2/12, 2009 at 20:56

What you really need is to just do a negative regex. $string !~ /foo\s*bar/. Updated with php, and perl versions. – Dandelion 2/12, 2009 at 21:2

Now it reports success even if the string doesn't contain bar. – Qualm 2/12, 2009 at 21:12

...in addition to the search for bar. Added in answer. – Dandelion 2/12, 2009 at 21:40

Taking the information from earlier answers, wrapping as a perl one-liner, and making the regular expressions case-insensitive.

Windows:

perl -lne "print $_ if $_ !~ m/foo\s*bar/i && $_ =~ m/bar/i;" c:\temp\xx.txt

Linux:

perl -lne 'print $_ if $_ !~ m/foo\s*bar/i && $_ =~ m/bar/i;' /tmp/xx.txt

With xx.txt containing:

foo is bar
hello bar
foobar
foo     bar
barbar
bar foo
barfoo
foo foo

The result of executing the one-liner at a command prompt:

foo is bar
hello bar
barbar
bar foo
barfoo

Amphictyony answered 7/5, 2013 at 23:20 Comment(0)

Recommended topics

Hot tags