How do I tell RegEx (.NET version) to get the smallest valid match instead of the largest?
For a regular expression like .*
or .+
, append a question mark (.*?
or .+?
) to match as few characters as possible. To optionally match a section (?:blah)?
but without matching unless absolutely necessary, use something like (?:blah){0,1}?
. For a repeating match (either using {n,}
or {n,m}
syntax) append a question mark to try to match as few as possible (e.g. {3,}?
or {5,7}?
).
The documentation on regular expression quantifiers may also be helpful.
The non greedy operator does not mean the shortest possible match:
abcabk
a.+?k
will match the entire string (in this example) instead of only the last three signs.
I'd like to actually find the smallest possible match instead.
That is that last possible match for 'a
' to still allow all matches for k
.
I guess the only way to do that is to make use of an expression like:
a[^a]+?k
const haystack = 'abcabkbk';
const paternNonGreedy = /a.+?k/;
const paternShortest = /a[^a]+?k/;
const matchesNonGreedy = haystack.match(paternNonGreedy);
const matchesShortest = haystack.match(paternShortest);
console.log('non greedy: ',matchesNonGreedy[0]);
console.log('shortest: ', matchesShortest[0]);
cab
. If my input is caaacab
and I search for a.*?b
it will return the full string instead of the short match inside. How would I search backwards from the b
? –
Ideatum c[^cb]*b
, it'll match the shortest path between c
and b
–
Choosey START[^START]*?END
(where START and END are your start and end character regexs). It essentially means "match anything from START to END where the in-between characters do not include START again" –
Stifling ,[^,]*?(venation|reticulat)[[:print:]]*?,
worked! –
Analyse A negative lookahead would help here
Example:
a...a.....a..b
a.*?b => a...a.....a..b
a(((?!a).)*?)b => a..b
a and b can be larger
start...start......start..end
start.*?end => start...start.....start..end
start(((?!start).)*?)end => start..end
Note: this won't find the shortest match in the string.
a...a.....a..b.a.b
a.*?b => a...a.....a..b
a(((?!a).)*?)b => a..b
This still finds a..b
not a.b
so it's not "Smallest possible match". I'm not sure you can find smallest possible match with regex. You could find all matches and then in those results find the smallest.
© 2022 - 2024 — McMap. All rights reserved.