regular expression - match word only once in line
Asked Answered
H

3

10

Case:

  1. ehello goodbye hellot hello goodbye
  2. ehello goodbye hello hello goodbye

I want to match line 1 (only has 'hello' once!) DO NOT want to match line 2 (contains 'hello' more than once)

Tried using negative look ahead look behind and what not... without any real success..

Helical answered 6/1, 2012 at 21:39 Comment(0)
M
9

A simple option is this (using the multiline flag and not dot-all):

^(?!.*\bhello\b.*\bhello\b).*\bhello\b.*$

First, check you don't have 'hello' twice, and then check you have it at least once.
There are other ways to check for the same thing, but I think this one is pretty simple.

Of course, you can simple match for \bhello\b and count the number of matches...

Messier answered 6/1, 2012 at 21:45 Comment(4)
Why not just find it once and check that it doesn't exist again after that? Seems a little less repetitive that way.Fielder
@Fielder - A patten like ^.*hello(?!.*hello) would not work, because it will always match the last hello of the line. You'd need something like ^(?:(?!hello).)*hello(?!.*hello), which isn't much more elegant. I may have missed something simple though...Messier
@Fielder - no. The regex engine tries to match, not to fail. It can match, so it will.Messier
Works, simple, understandable.Helical
L
3

A generic regex would be:

^(?:\b(\w+)\b\W*(?!.*?\b\1\b))*\z

Altho it could be cleaner to invert the result of this match:

\b(\w+)\b(?=.*?\b\1\b)

This works by matching a word and capturing it, then making sure with a lookahead and a backreference that it does/not follow anywhere in the string.

Leu answered 6/1, 2012 at 21:51 Comment(1)
Doh, I misread the question, thought hello could be any word, and the purpose of the regex was to make sure no word repeats. Will leave the answer in case that's of any interest to anyone.Leu
P
1

Since you're only worried about words (ie tokens separated by whitespace), you can just split on spaces and see how often "hello" appears. Since you didn't mention a language, here's an implementation in Perl:

use strict;
use warnings;

my $a1="ehello goodbye hellot hello goodbye";
my $a2="ehello goodbye hello hello goodbye";

my @arr1=split(/\s+/,$a1);
my @arr2=split(/\s+/,$a2);

#grab the number of times that "hello" appears

my $num_hello1=scalar(grep{$_ eq "hello"}@arr1);
my $num_hello2=scalar(grep{$_ eq "hello"}@arr2);

print "$num_hello1, $num_hello2\n";

The output is

1, 2
Pelecypod answered 6/1, 2012 at 21:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.