Positive Lookbehind greedy
Asked Answered
A

1

5

I think I have some misunderstanding about how a positive Lookbehind works in Regex, here is an example:

12,2 g this is fully random
89 g random string 2
0,6 oz random stuff
1 really random stuff

Let's say I want to match everything after the measuring unit, so I want "this is fully random", "random string 2", "random stuff" and really "random stuff".

In order to do that I tried the following pattern:

(?<=(\d(,\d)?) (g|oz)?).*

But as "?" means 0 or 1, it seems that the pattern prioritizes 0 over 1 in that case - So I get: enter image description here

But the measuring unit has to stay "optional" as it won't necessary be in the string (cf fourth instance)...

Any idea on how to deal with that issue? Thanks!

Aeolis answered 26/9, 2020 at 12:37 Comment(0)
C
7

It would be easier to look at the positions that it matches to see what happens. The assertion (?<=(\d(,\d)?) (g|oz)?) is true at a position where what is directly to the left is (\d(,\d)?) and optional (g|oz)?

The pattern goes from left to right, and the assertion is true at multiple places. But at the first place it encounters, it matches .* meaning 0+ times any char and will match until the end of the line.

See the positions on regex101

What you might do instead is match the digit part and make the space followed by g or oz optional and use a capturing group for the second part.

\d+(?:,\d+)?(?: g| oz)? (.*)

Regex demo

Copulation answered 26/9, 2020 at 13:6 Comment(3)
Great answer! I intuitively knew why it worked that way (in a seemingly non-greedy manner) but had a hard time writing an explanation in a comment or an answer so I gave up :DBold
It appears to me that (?: g|)? = (?: g|) = (?: g)?Polychromy
Oh okay, it makes more sense now! I will definitely go with that solution that's exactly what I was looking for, thanks :)Aeolis

© 2022 - 2024 — McMap. All rights reserved.