Regex negative lookbehinds with a wildcard
Asked Answered
M

2

9

I'm trying to match some text if it does not have another block of text in its vicinity. For example, I would like to match "bar" if "foo" does not precede it. I can match "bar" if "foo" does not immediately precede it using negative look behind in this regex:

/(?<!foo)bar/

but I also like to not match "foo 12345 bar". I tried:

/(?<!foo.{1,10})bar/

but using a wildcard + a range appears to be an invalid regex in Ruby. Am I thinking about the problem wrong?

Marisolmarissa answered 30/11, 2012 at 19:15 Comment(0)
S
13

You are thinking about it the right way. But unfortunately lookbehinds usually have be of fixed-length. The only major exception to that is .NET's regex engine, which allows repetition quantifiers inside lookbehinds. But since you only need a negative lookbehind and not a lookahead, too. There is a hack for you. Reverse the string, then try to match:

/rab(?!.{0,10}oof)/

Then reverse the result of the match or subtract the matching position from the string's length, if that's what you are after.

Now from the regex you have given, I suppose that this was only a simplified version of what you actually need. Of course, if bar is a complex pattern itself, some more thought needs to go into how to reverse it correctly.

Note that if your pattern required both variable-length lookbehinds and lookaheads, you would have a harder time solving this. Also, in your case, it would be possible to deconstruct your lookbehind into multiple variable length ones (because you use neither + nor *):

/(?<!foo)(?<!foo.)(?<!foo.{2})(?<!foo.{3})(?<!foo.{4})(?<!foo.{5})(?<!foo.{6})(?<!foo.{7})(?<!foo.{8})(?<!foo.{9})(?<!foo.{10})bar/

But that's not all that nice, is it?

Staurolite answered 30/11, 2012 at 19:23 Comment(1)
Reversing the string was an interesting idea. Thanks!Marisolmarissa
C
4

As m.buettner already mentions, lookbehind in Ruby regex has to be of fixed length, and is described so in the document. So, you cannot put a quantifier within a lookbehind.

You don't need to check all in one step. Try doing multiple steps of regex matches to get what you want. Assuming that existence of foo in front of a single instance of bar breaks the condition regardless of whether there is another bar, then

string.match(/bar/) and !string.match(/foo.*bar/)

will give you what you want for the example.

If you rather want the match to succeed with bar foo bar, then you can do this

string.scan(/foo|bar/).first == "bar"
Chubby answered 30/11, 2012 at 21:26 Comment(4)
That is problematic if the idea is to actually retrieve a match. Say you have have bar foo bar. The regex that the OP tried would retrieve the first bar. Your solution would claim that there is no match. (Apart from the fact that you omitted the "up to 10 characters" heuristic)Staurolite
@m.buettner You and I have different interpretations with the question.Chubby
Sure. Which is why I don't say your solution is wrong. But I find it important that such assumptions and differences are stated. Because they might not be apparent to the OP or anyone else who finds this question in the future.Staurolite
Thanks for articulating the different interpretations. I'm accepting @m.buettner's response as it was what I neededMarisolmarissa

© 2022 - 2024 — McMap. All rights reserved.