How to find a word NOT preceded by another specific word?
Asked Answered
P

5

7

Which regular expression can I use to find all strings bar are not preceded by string foo? Having whitespace between the two is also illegal.

So the regex should match the following strings

foo is bar
hello bar

But not these

foobar
foo     bar

I've tried using the following

(?!<foo)bar

and it gets the work done for eliminating foobar, but I need to take care of the whitespace, and of course

(?!<foo)\s*bar

matches all the strings.

Thanks!

Penley answered 2/12, 2009 at 20:40 Comment(3)
"matches all the strings." - pedant mode: (?!<foo)\s*bar doesn't match 'foobar'Qualm
You're right, thanks for pointing that out! I ended up using the following: preg_match('/(foo)?\s*bar/', haystack, matches); which will find the bar (whether preceeded by foo or not), and then a quick check on matches[] will identify if a foo was there or not.Penley
The thing you are looking for is specifically called a zero-width negative look-behind assertion. Perl notably doesn't support variable-width look-behind (positive or negative), so things like \s* inside one of them won't work. Try using multiple match operators instead.Deadman
R
4

Better to use other facilities of the programming language than to look too hard for a regex pattern.

You are looking for strings for which $s =~ /bar/ and not $s =~ /foo\s*bar/ is true.

The rest of the script below is just for testing.

#!/usr/bin/perl

use strict; use warnings;

my %strings = (
    'foo is bar'  => 1,
    'hello bar'   => 1,
    'foobar'      => 0,
    'foo     bar' => 0,
    'barbar'      => 1,
    'bar foo'     => 1,
    'foo foo'     => 0,
);

my @accept = grep { $strings{$_} } keys %strings;
my @reject = grep { not $strings{$_} } keys %strings;

for my $s ( @accept ) {
    if ( $s =~ /bar/ and not $s =~ /foo\s*bar/ ) {
        print "Good: $s\n";
    }
    else {
        print "Bad : $s\n";
    }
}

for my $s ( @reject ) {
    if ( $s =~ /bar/ and not $s =~ /foo\s*bar/ ) {
        print "Bad : $s\n";
    }
    else {
        print "Good: $s\n";
    }
}

Output:

E:\srv\unur> j
Good: bar foo
Good: hello bar
Good: foo is bar
Good: barbar
Good: foo foo
Good: foo     bar
Good: foobar
Revolution answered 2/12, 2009 at 20:57 Comment(4)
Won't that match even if the string does not contain 'bar'?Qualm
'bar foobar' also makes an interesting test case. I'm not sure what the expected output is here though.Qualm
Personally, when I find a pattern hard to match using regex; I need to go learn more regex, or get a refresher. I think that making an inflexible lookup table when it is not needed is no way to grow as a programmer.Accomplished
Personally, I think you should read the code before downvoting. The look up table is there to list the test cases and make it easy to add test cases: The table has nothing to do with the logic. The logic consists entirely of $s =~ /bar/ and not $s =~ /foo\s*bar/.Balmoral
S
2

Given a few test cases

my @match = (
  "foo is bar",
  "hello bar",
);

my @reject = (
  "foobar",
  "foo     bar",
);

you could of course do by feeding the results of one pattern to another:

my @control = grep !/foo\s*bar/, grep /bar/ => @match, @reject;

We can also do it with one:

my $nofoo = qr/
  (      [^f] |
    f  (?! o) |
    fo (?! o  \s* bar)
  )*
/x;

my $pattern = qr/^ $nofoo bar /x;

But don't take my word for it.

for (@match) {
  print +(/$pattern/ ? "PASS" : "FAIL"), ": $_\n";
}

for (@reject) {
  print +(/$pattern/ ? "FAIL" : "PASS"), ": $_\n";
}
Schist answered 2/12, 2009 at 20:51 Comment(2)
Impressive that you got this to work. Most likely "foo" and "bar" are just placeholders for much longer strings. It looks like your regular expressions are going to get extremely long for any real world examples. +1 for the different approach though.Qualm
Thanks, and the sad news is that a literal pattern is the best case. I wonder what the limit of this approach is. It'd be nice for such tasks to have a regular-expression switch that complements the accept status of each NFA state.Schist
S
0
  (?!<foo)\s*bar

This will match the whitespace

Seneschal answered 2/12, 2009 at 20:42 Comment(2)
Uh no. First, it's (?<!..) and second, the \s* needs to be inside the lookbehind or it will always match unless there is no whitespace between foo and bar. Mark Byers' got it right.Lindbergh
sure sure all I knows is JA edited my answer, I feel blessed.Seneschal
D
0

php:

!preg_match(/foo\s*bar/,$string) && preg_match(/bar/,$string)

perl:

$string !~ /foo\s*bar/ && $string =~ /bar/
Dandelion answered 2/12, 2009 at 20:46 Comment(4)
Ah, yes, because all of the strings technically can be found to have non-foo strings before bar...Dandelion
What you really need is to just do a negative regex. $string !~ /foo\s*bar/. Updated with php, and perl versions.Dandelion
Now it reports success even if the string doesn't contain bar.Qualm
...in addition to the search for bar. Added in answer.Dandelion
A
0

Taking the information from earlier answers, wrapping as a perl one-liner, and making the regular expressions case-insensitive.

Windows:

perl -lne "print $_ if $_ !~ m/foo\s*bar/i && $_ =~ m/bar/i;" c:\temp\xx.txt

Linux:

perl -lne 'print $_ if $_ !~ m/foo\s*bar/i && $_ =~ m/bar/i;' /tmp/xx.txt

With xx.txt containing:

foo is bar
hello bar
foobar
foo     bar
barbar
bar foo
barfoo
foo foo

The result of executing the one-liner at a command prompt:

foo is bar
hello bar
barbar
bar foo
barfoo
Amphictyony answered 7/5, 2013 at 23:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.