Make a non-greedy RegEx in backward direction to behave the same like in forward direction
Asked Answered
P

4

13

This pattern:

/a+?b+?/

Against the following string:

aaaaaabbbbbb

Matches:

aaaaaab

We see that the non-greedy behaves different in backward/left direction (takes all) and forward/right direction (takes just one).

Is there a way to make the non-greedy at the beginning, that matches all the a, to match as less as possible too? So that it behaves in the same way like at with the b part at the end?

Pompei answered 3/3, 2013 at 21:47 Comment(0)
O
8

The short answer

Regexes generally match from left-to-right unless you set a right-to-left flag (which very few flavors support). In either case, they do not start in the middle and then work out in both directions, even if you use a lookbehind.

How do lazy quantifiers work?

It helps to stop and ask - why does the lazy quantifier exist in the first place? What problem was it meant to solve?

Normal (greedy) quantifiers work by finding a matching pattern of text and then repeatedly matching a sequence of characters until they can match no more. This behavior is usually desired, but you run into problems when you have a very general pattern followed by a very specific pattern where the specific pattern is a subset of the general pattern.

For example, consider the following input:

_abc_END_def_END

And this pattern:

(\w+END)

The intent was to match _abc_ and then END. The problem is that END is a subset of \w+. Using standard "greedy" rules, the \w+ matches as much as possible. So rather than matching _abc_, it matched _abc_END_def.

The solution to this scenario is to change the way the quantifier (+) behaves with the lazy modifier ?. By changing the expression to \w+?, the regex engine is forced to match only as much as necessary to satisfy the expression and no more. The expression is satisfied when \w+? matches _abc_ and END matches its literal string.

The purpose of the lazy quantifier is not to match a "minimum" number of characters - it is about giving that second pattern, a subset of the first, an opportunity to match.

Going back to your question

In your example, b is not a subset of a, so there is no need for the lazy quantifier. If you want to match one or more a's, but as few as possible, and one or more b's, but as few as possible, then you'd simply use:

ab

Or, if your a is a stand-in for some superset which may include b:

[ab]b

For example:

\wb

Both of which would match:

ab

Example:

const input = "aaabbb"

console.log(/ab/.exec(input)[0])
Orchestrate answered 4/3, 2013 at 14:12 Comment(0)
T
3

Precede with greedy non-capture group:

/(?:a)*a+?b+?/
Tuition answered 27/2, 2019 at 17:21 Comment(0)
T
1

If you do not have to ability to do the previously mentioned Right To Left match, then you can simply reverse the string, reverse the regex expression, then reverse the result at the end.

The work is as follows:

Start with aaaaaabbbbbb
Reverse to bbbbbbaaaaaa
Reverse /a+?b+?/ to /b+?a+?/
The resulting Match is bbbbbba
Reverse the resulting match to get abbbbbb
Twelfth answered 10/3, 2016 at 20:55 Comment(0)
F
-1

They do behave the same! A lazy quantifier (in this case a lazy +) tells the regex engine to

  • start at the first possible position,
  • then match as few characters as possible (at least one in the case of a +)
  • but match as many as necessary to allow an overall match to occur.

Regexes don't match "leftwards" or "backwards", as you seem to imply.

What exactly are you trying to achieve? I guess it's not this simple example - that would be trivial to fix (just make the regex ab, which is probably not what you're looking for).

Falsecard answered 3/3, 2013 at 21:53 Comment(8)
I want to know a general way on how to get as less as possible matches on the left side with the a. Yes of course this is just an example.Pompei
Can you give an example that makes sense? Then it might be possible to show you a meaningful solution.Falsecard
@flori: You need to somehow reject the match aaaaaab, aaaaab, ... aab, to match ab, if that is what you want. In this case, I would go with indexOf("ab").Friedman
@nhahtdh: That's what I proposed - the current example doesn't really make sense. Now it would be a different question how to match <foo> in <bar <foo> baz> using a regex like <.+?>, but I want to wait what flori actually wants.Falsecard
@TimPietzcker A practical example is tricky ;-) It was more a theoretical question and how RegEx works. I will try to construct a real use case. Or the answers to my question is just – this is never needed in real life?Pompei
@Pompei - "You should only ask practical, answerable questions based on actual problems that you face" - FAQOrchestrate
@flori: Matching little on both side is equivalent to limiting the character count to 0/1. I don't think the "finitely many" property of * and + will be useful, rather it is going against what you want.Friedman
@Cyborgx37 Whoops, thanks. Actually it is somehow based on a real problem but I can't explain the whole context. However this discussion helped me already a lot!Pompei

© 2022 - 2024 — McMap. All rights reserved.