How to match, but not capture, part of a regex?
Asked Answered
T

11

328

I have a list of strings. Some of them are of the form 123-...456. The variable portion "..." may be:

  • the string "apple" followed by a hyphen, e.g. 123-apple-456
  • the string "banana" followed by a hyphen, e.g. 123-banana-456
  • a blank string, e.g. 123-456 (note there's only one hyphen)

Any word other than "apple" or "banana" is invalid.

For these three cases, I would like to match "apple", "banana", and "", respectively. Note that I never want capture the hyphen, but I always want to match it. If the string is not of the form 123-...456 as described above, then there is no match at all.

How do I write a regular expression to do this? Assume I have a flavor that allows lookahead, lookbehind, lookaround, and non-capturing groups.


The key observation here is that when you have either "apple" or "banana", you must also have the trailing hyphen, but you don't want to match it. And when you're matching the blank string, you must not have the trailing hyphen. A regex that encapsulates this assertion will be the right one, I think.

Threedimensional answered 13/10, 2010 at 17:48 Comment(2)
You want to match everything except for hyphens?Flagship
I have a much easier example, I have [email protected] and I want to match company in order to hand it over for further processingAmericana
W
486

The only way not to capture something is using look-around assertions:

(?<=123-)((apple|banana)(?=-456)|(?=456))

Because even with non-capturing groups (?:…) the whole regular expression captures their matched contents. But this regular expression matches only apple or banana if it’s preceded by 123- and followed by -456, or it matches the empty string if it’s preceded by 123- and followed by 456.

Lookaround Name What it Does
(?=foo) Lookahead Asserts that what immediately FOLLOWS the current position in the string is foo
(?<=foo) Lookbehind Asserts that what immediately PRECEDES the current position in the string is foo
(?!foo) Negative Lookahead Asserts that what immediately FOLLOWS the current position in the string is NOT foo
(?<!foo) Negative Lookbehind Asserts that what immediately PRECEDES the current position in the string is NOT foo
Werby answered 13/10, 2010 at 17:58 Comment(5)
+1 — In this case, you can work around that by using group 1 rather than group 0, but this is an excellent (and subtle!) distinction.Tasha
@Ben Blank: It definitely depends on how “match” and “capture” are interpreted.Werby
Not supported in JavaScript, yay! would be nice to have a JS friendly method, but not bad at all, +0.5 (rounding up ;D )Jodiejodo
Love look-around assertions! These work great with Ruby as well.Typewritten
@Jodiejodo I checked in JS, and it's working fine, check this demo.Stovall
C
17

In javascript try: /123-(apple(?=-)|banana(?=-)|(?!-))-?456/

Remember that the result is in group 1

Debuggex Demo


Based on the input provided by Germán Rodríguez Herrera

Crissie answered 7/4, 2015 at 18:14 Comment(1)
But the OP want to match only "apple", "banana", and "". It's not the case here. See this demo.Stovall
F
10

Try:

123-(?:(apple|banana|)-|)456

That will match apple, banana, or a blank string, and following it there will be a 0 or 1 hyphens. I was wrong about not having a need for a capturing group. Silly me.

Foreside answered 13/10, 2010 at 17:52 Comment(2)
This is not correct since it matches, for example, "123-coconut-456".Threedimensional
Thought you wanted it more general...fixed.Foreside
V
6

I have modified one of the answers (by @op1ekun):

123-(apple(?=-)|banana(?=-)|(?!-))-?456

The reason is that the answer from @op1ekun also matches "123-apple456", without the hyphen after apple.

Vinculum answered 5/1, 2017 at 10:38 Comment(1)
Comments should be used for improvements and not answersTribasic
C
4

Try this:

/\d{3}-(?:(apple|banana)-)?\d{3}/
Cowpoke answered 13/10, 2010 at 17:50 Comment(5)
This is not correct since it matches, for example, "123-coconut-456".Threedimensional
@david: how's that different from your "banana" example?Raman
@SilentGhost: I only want to capture apple or banana or "". All other values are invalid, as I stated.Threedimensional
sry, in that case: /\d{3}-(?:(apple|banana)-)?\d{3}/Cowpoke
What this example shows is that it is possible to have a non-capturing group without using lookahead and lookbehind.Laureenlaurel
T
0

A variation of the expression by @Gumbo that makes use of \K for resetting match positions to prevent the inclusion of number blocks in the match. Usable in PCRE regex flavours.

123-\K(?:(?:apple|banana)(?=-456)|456\K)

Matches:

Match 1  apple
Match 2  banana
Match 3
Tratner answered 24/3, 2020 at 23:20 Comment(0)
E
0

echo '16' | xargs | grep -oP '\d'

Evenson answered 16/5, 2023 at 22:35 Comment(1)
Please add some explanation for your code rather than posting code only. Additional explanation will be more helpful.Illuminator
V
0

"123-apple-456, 87568-555"

/(\d+-)(?:[a-z]*-?)*(\d+)/

\1\2

123-456, 87568-555
Vaivode answered 27/10, 2023 at 9:49 Comment(1)
This answer needs to be improved with any amount of explanation.Excited
C
0

123-(?:(apple|banana)-)?456

The word in the middle is in capturing group 1 (.groups()[0]). If it doesn't exist, this returns null.

Carvel answered 9/1 at 17:43 Comment(0)
B
0

Regular expression to test:

\d{3}-(?:(apple|banana)-|)\d{3}

matches strings with three digits, followed by either "apple-", "banana-", or nothing, and ending with three digits. It captures "apple" or "banana" if present. It also captures result as empty when there is no "apple" or "banana"

Sandbox test link

Tested on the following data set:

123-apple-456
123-banana-456
123-banana456
123banana-456
123-456
123456
123-coconut-456
123-123-456
123-apple456

Found matches:

Match 1
1.  apple
Match 2
1.  banana
Match 3
1.   
Match 4
1.   
Bufflehead answered 12/1 at 17:4 Comment(0)
S
-5

By far the simplest (works for python) is '123-(apple|banana)-?456'.

Spinney answered 24/6, 2016 at 18:43 Comment(1)
This would match 123-apple456 so it isn't correct.Smarm

© 2022 - 2024 — McMap. All rights reserved.