RegEx: Smallest possible match or nongreedy match
Asked Answered
C

4

178

How do I tell RegEx (.NET version) to get the smallest valid match instead of the largest?

Cone answered 17/12, 2009 at 7:12 Comment(0)
R
310

For a regular expression like .* or .+, append a question mark (.*? or .+?) to match as few characters as possible. To optionally match a section (?:blah)? but without matching unless absolutely necessary, use something like (?:blah){0,1}?. For a repeating match (either using {n,} or {n,m} syntax) append a question mark to try to match as few as possible (e.g. {3,}? or {5,7}?).

The documentation on regular expression quantifiers may also be helpful.

Religieux answered 17/12, 2009 at 7:15 Comment(4)
Line2 "but without matching unless absolutely necessary": What does this mean?Solvable
Won't this '{0,1}' match nothing because of the 0? Why don't use '{1}' instead?Gilkey
Regular expressions are greedy by default, which means they try to match as much as possible. Adding the question mark right after the braces means that it will try to match the fewest possible times, but will still match if it can't avoid it. Just using '{1}' means that it must match exactly once.Religieux
This is not what lazy matching does. See stackoverflow.com/questions/35944441Egad
A
102

The non-greedy operator, ?. Like so:

.*?
Actinism answered 17/12, 2009 at 7:14 Comment(0)
S
74

The non greedy operator does not mean the shortest possible match:

abcabk

a.+?k will match the entire string (in this example) instead of only the last three signs.

I'd like to actually find the smallest possible match instead.

That is that last possible match for 'a' to still allow all matches for k.

I guess the only way to do that is to make use of an expression like:

a[^a]+?k

const haystack = 'abcabkbk';
const paternNonGreedy = /a.+?k/;
const paternShortest = /a[^a]+?k/;

const matchesNonGreedy = haystack.match(paternNonGreedy);
const matchesShortest = haystack.match(paternShortest);

console.log('non greedy: ',matchesNonGreedy[0]);
console.log('shortest: ', matchesShortest[0]);
Sign answered 31/1, 2014 at 11:22 Comment(9)
Or search in reverse order, starting at the end, when matches are nested: "(ab(abk)bk)".Free
@Free how would one search in reverse order? don't get itAbhorrent
@Free Still open question: How would one search in reverse order? Lets say I want to get cab. If my input is caaacab and I search for a.*?b it will return the full string instead of the short match inside. How would I search backwards from the b?Ideatum
Reverse the string, then apply the regex.Cone
I don't really see how reversing the string would help at all. In the string of the example it would only work because there is only one k, add another k at the end and reversing the string will end up giving you the exact same problem.Sign
@C4u Try c[^cb]*b, it'll match the shortest path between c and bChoosey
This is super helpful. For people like me trying to understand what's going on here the generic form is START[^START]*?END (where START and END are your start and end character regexs). It essentially means "match anything from START to END where the in-between characters do not include START again"Stifling
I suppose this would only work when START is a single character?Lowelllowenstein
@Stifling Thanks for this. I was trying to grab the comma-bounded fragment of a sentence that talks about a leaf's venation, but kept getting the fragments before it despite using a lazy search. ,[^,]*?(venation|reticulat)[[:print:]]*?, worked!Analyse
K
0

A negative lookahead would help here

Example:

a...a.....a..b


a.*?b            =>   a...a.....a..b
a(((?!a).)*?)b   =>   a..b

a and b can be larger

start...start......start..end


start.*?end                =>   start...start.....start..end
start(((?!start).)*?)end   =>   start..end

Note: this won't find the shortest match in the string.

a...a.....a..b.a.b


a.*?b            =>   a...a.....a..b
a(((?!a).)*?)b   =>   a..b

This still finds a..b not a.b so it's not "Smallest possible match". I'm not sure you can find smallest possible match with regex. You could find all matches and then in those results find the smallest.

Kneel answered 20/11, 2023 at 0:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.