Exclusive Or in Regular Expression
Asked Answered
E

14

38

Looking for a bit of regex help. I'd like to design an expression that matches a string with "foo" OR "bar", but not both "foo" AND "bar"

If I do something like...

/((foo)|(bar))/

It'll match "foobar". Not what I'm looking for. So, how can I make regex match only when one term or the other is present?

Thanks!

Escaut answered 29/10, 2008 at 15:7 Comment(3)
Would foofoobar be a match because it contains "foo" and "foobar"? How about "foonbar"? Could you provide examples of matches and non-matches?Calash
Matches: "foo", "bar" nonmatches: "foofoo" "barfoo" "foobarfoo" "barbar" "barfoofoo"Escaut
If you don't want "foofoo" to match, then you're not really talking about an exclusive or.Seat
I
9

You can do this with a single regex but I suggest for the sake of readability you do something like...

(/foo/ and not /bar/) || (/bar/ and not /foo/)
Imperfection answered 29/10, 2008 at 15:11 Comment(3)
Indeed, I'm pretty sure I would put the XOR logic into the code itself, and not in the regexp.Sorbian
Or even better, /foo/ xor /bar/, if your language has an XOR operator. (Perl does.)Seat
@Ralf This is not a single expression, it is two expressions joined with a logical OR operator.Imperfection
O
44

This is what I use:

/^(foo|bar){1}$/

See: http://www.regular-expressions.info/quickstart.html under repetition

Ohmmeter answered 2/12, 2011 at 19:57 Comment(5)
Much more elegant solution than the accepted answer, especially when you got more than 2 cases..Thai
Why did you add {1}, what does that mean?Liege
This is wrong, it only means foo or bar should be matched only once.Mesocratic
I agree with @Mesocratic this is not an XOR. It only checks if the whole string is "foo" or "bar"Tippett
You don’t need the {1}, because that indicates it should be repeated 1 times. It’s sufficient to have /^(foo|bar)$/. The reason this regex works is that you’ve used ^ and $; it has nothing to do with repetition.Southworth
A
20

If your regex language supports it, use negative lookaround:

(?<!foo|bar)(foo|bar)(?!foo|bar)

This will match "foo" or "bar" that is not immediately preceded or followed by "foo" or "bar", which I think is what you wanted.

It's not clear from your question or examples if the string you're trying to match can contain other tokens: "foocuzbar". If so, this pattern won't work.

Here are the results of your test cases ("true" means the pattern was found in the input):

foo: true
bar: true
foofoo: false
barfoo: false
foobarfoo: false
barbar: false
barfoofoo: false
Adda answered 29/10, 2008 at 16:32 Comment(0)
L
10

This will take 'foo' and 'bar' but not 'foobar' and not 'blafoo' and not 'blabar':

/^(foo|bar)$/

^ = mark start of string (or line)
$ = mark end of string (or line)

This will take 'foo' and 'bar' and 'foo bar' and 'bar-foo' but not 'foobar' and not 'blafoo' and not 'blabar':

/\b(foo|bar)\b/

\b = mark word boundry
Liege answered 9/9, 2014 at 10:46 Comment(0)
I
9

You can do this with a single regex but I suggest for the sake of readability you do something like...

(/foo/ and not /bar/) || (/bar/ and not /foo/)
Imperfection answered 29/10, 2008 at 15:11 Comment(3)
Indeed, I'm pretty sure I would put the XOR logic into the code itself, and not in the regexp.Sorbian
Or even better, /foo/ xor /bar/, if your language has an XOR operator. (Perl does.)Seat
@Ralf This is not a single expression, it is two expressions joined with a logical OR operator.Imperfection
I
3

You haven't specified behaviour regarding content other than "foo" and "bar" or repetitions of one in the absence of the other. e.g., Should "food" or "barbarian" match?

Assuming that you want to match strings which contain only one instance of either "foo" or "bar", but not both and not multiple instances of the same one, without regard for anything else in the string (i.e., "food" matches and "barbarian" does not match), then you could use a regex which returns the number of matches found and only consider it successful if exactly one match is found. e.g., in Perl:

@matches = ($value =~ /(foo|bar)/g)  # @matches now hold all foos or bars present
if (scalar @matches == 1) {          # exactly one match found
  ...
}

If multiple repetitions of that same target are allowed (i.e., "barbarian" matches), then this same general approach could be used by then walking the list of matches to see whether the matches are all repeats of the same text or if the other option is also present.

Inaugurate answered 29/10, 2008 at 16:59 Comment(0)
A
2

You might want to consider the ? conditional test.

(?(?=regex)then|else)

Regular Expression Conditionals

Arlo answered 29/10, 2008 at 15:56 Comment(0)
S
2

If you want a true exclusive or, I'd just do that in code instead of in the regex. In Perl:

/foo/ xor /bar/

But your comment:

Matches: "foo", "bar" nonmatches: "foofoo" "barfoo" "foobarfoo" "barbar" "barfoofoo"

indicates that you're not really looking for exclusive or. You actually mean "Does /foo|bar/ match exactly once?"

my $matches = 0;
while (/foo|bar/g) {
  last if ++$matches > 1;
}

my $ok = ($matches == 1)
Seat answered 29/10, 2008 at 17:59 Comment(0)
W
2

I know this is a late entry, but just to help others who may be looking:

(/b(?:(?:(?!foo)bar)|(?:(?!bar)foo))/b)
Wast answered 6/7, 2015 at 3:19 Comment(0)
H
0

I'd use something like this. It just checks for space around the words, but you could use the \b or \B to check for a border if you use \w. This would match " foo " or " bar ", so obviously you'd have to replace the whitespace as well, just in case. (Assuming you're replacing anything.)

/\s((foo)|(bar))\s/
Hipbone answered 29/10, 2008 at 15:15 Comment(0)
T
0

I don't think this can be done with a single regular expression. And boundaries may or may not work depending on what you're matching against.

I would match against each regex separately, and do an XOR on the results.

foo = re.search("foo", str) != None
bar = re.search("bar", str) != None
if foo ^ bar:
    # do someting...
Tyner answered 29/10, 2008 at 15:23 Comment(0)
T
0

I tried with Regex Coach against:

x foo y
x bar y
x foobar y

If I check the g option, indeed it matches all three words, because it searches again after each match.
If you don't want this behavior, you can anchor the expression, for example matching only on word boundaries:

\b(foo|bar)\b

Giving more context on the problem (what the data looks like) might give better answers.

Tager answered 29/10, 2008 at 15:27 Comment(0)
M
0
\b(foo)\b|\b(bar)\b

And use only the first capture group.

Mellow answered 29/10, 2008 at 15:41 Comment(0)
F
0

Using the word boundaries, you can get the single word...

me@home ~  
$ echo "Where is my bar of soap?" | egrep "\bfoo\b|\bbar\b"  
Where is my bar of soap?  

me@home ~  
$ echo "What the foo happened here?" | egrep "\bfoo\b|\bbar\b"  
What the foo happened here?  

me@home ~  
$ echo "Boy, that sure is foobar\!" | egrep "\bfoo\b|\bbar\b"  
Faulk answered 29/10, 2008 at 16:44 Comment(0)
D
-2

for starters, it helps to have a good test case:

apple
banana
orange
apple and banana
apple and orange
banana and apple
banana and orange
orange and apple
orange and banana
apple and banana and orange
orange and banana and orange
none of the above
apple and microsoft

https://regex101.com/r/ebpyjX/1

first, ^(?P<XOR>(?P<apple_not_banana>(?=.*apple)(?!.*banana))|(?P<banana_not_apple>(?!.*apple)(?=.*banana))).*$ is reads as"apple NOT banana, OR, not apple, banana" works directly as XOR but would potentially become long with many subterms:

direct xor approach

one and a halfth, ^(?P<XOR>(?=.*apple)(?!.*banana)(?!.*orange)|(?!.*apple)(?=.*banana)(?!.*orange)|(?!.*apple)(?!.*banana)(?=.*orange)).*$ reads as "apple not banana not orange; not apple, banana, not orange; not apple, not banana, orange" and is an example of the three-valued xor, via enumeration of the product:

xor apple banana orange

one and three quarters ^(?P<XOR>(?!(?P<exclusive>(.*apple|.*banana|.*orange){2,}))(?=.*(?P<fruit>apple|banana|orange)).*$) reads as, "NOT (apple, banana, or orange) 2 or more times, and apple or banana or orange" could scale better in terms of length to larger numbers of terms because it groups logic more effectively, EDIT: however, in practice searching NLM UMLS, I discovered this approach may cause catastrophic backtracking.

exclude 2 or more, match one

second, (?P<xor_is_or_and_nand>^(?P<nand_apple_banana>(?!(?:(?=.*apple)(?=.*banana))))(?P<apple_or_banana>(?=.*apple|.*banana)).*$) reads as "NOT apple AND banana, AND apple OR banana" shows how XOR is OR + NAND i.e. in lisp this would look like (and (or apple banana) (nand apple banana)):

apple xor banana via or and nand

finally, (?P<not_with_xor>^(?P<not_microsoft>(?!.*microsoft))(?P<nand_apple_banana>(?!(?:(?=.*apple)(?=.*banana))))(?P<apple_or_banana>(?=.*apple|.*banana)).*$) reads "NOT microsoft, AND NOT apple AND banana, AND apple OR banana" shows how to ensure coexistence of other negations with the xor query:

not microsoft, and apple or banana, and apple nand banana

Dhyana answered 9/1 at 13:2 Comment(7)
Some explanation of these reges would be useful, rather than just showing them with screenshots of their testing. What are all the named capture groups for? Given that the question and it accepted answer is over 15 years old, any new answer should focus on giving new information and giving new insights into the topic.Surfacetosurface
@Surfacetosurface if you think the accepted answer meets your high standards, then you're welcome to use that!Dhyana
anyway, named groups are to explain how they work, and for maintainability, readabilityDhyana
This site aims to provide high quality answers for future readers. Many of us seek to maintain the usefulness of the questions and answers here. For the reasons I gave above, I think this answer could be much improved.Surfacetosurface
there you go, i wrote them out, enjoy your c#Dhyana
Adrian isn't being negative, he's providing constructive feedback. There is an expectation on this site that the community will always focus on high standards, so they are not just Adrian's high standard.Mummify
The OP has asked about XOR only. You've provided information that goes beyond what's being asked. It's reducing the usefulness of he answer.Mummify

© 2022 - 2024 — McMap. All rights reserved.