Why does my non-greedy Perl regex match nothing?
Asked Answered
K

5

5

I thought I understood Perl RE to a reasonable extent, but this is puzzling me:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured $1\n";
       print "Matched $&";
}
else {
       print "What?!!";
}

prints

Captured
Matched '

It seems it has matched the ending ' alone, and so captured nothing.
I would have expected it to match the entire thing, or if it's totally non-greedy, nothing at all (as everything there is an optional match).
This in between behaviour baffles me, can anyone explain what is happening?

Kugler answered 3/4, 2009 at 7:27 Comment(0)
T
15

The \'? at the beginning and end means match 0 or 1 apostrophes greedily. (As another poster has pointed out, to make it non-greedy, it would have to be \'??)

The .*? in the middle means match 0 or more characters non-greedily.

The Perl regular expression engine will look at the first part of the string. It will match the beginning, but does so greedily, so it picks up the first apostrophe. It then matches non-greedily (so takes as little as it can) followed by an optional apostrophe. This is matched by the empty string.

Thanet answered 3/4, 2009 at 8:19 Comment(1)
In other words, only the beginning apostrophe was matched, the rest of the regex matches the empty string.Qualification
D
3

pattern? is greedy, if you want it to be non-greedy you must say pattern??:

#!/usr/bin/perl
use strict;
use warnings;

my $test = "'some random string'";

if($test =~ /\'?(.*?)\'?/) {
       print "Captured [$1]\n";
       print "Matched  [$&]\n";
}
if($test =~ /\'??(.*?)\'??/) {
       print "Captured [$1]\n";
       print "Matched  [$&]\n";
}

from perldoc perlre:

The following standard quantifiers are recognized:

*      Match 0 or more times
+      Match 1 or more times
?      Match 1 or 0 times
{n}    Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don’t change, just the "greediness":

*?     Match 0 or more times
+?     Match 1 or more times
??     Match 0 or 1 time
{n}?   Match exactly n times
{n,}?  Match at least n times
{n,m}? Match at least n but not more than m times
Dunstan answered 3/4, 2009 at 7:57 Comment(16)
Nope, perl regex is greedy by default and ? makes them non-greedy.Harriette
Um, read what I said again, pattern? is greedy (because that is the default), to get non-greedy you must say pattern??.Dunstan
From the perldoc you quoted: If you want it to match the minimum number of times possible, follow the quantifier with a "?".Harriette
yes, you must follow the quantifier with ?. The pattern is not the quantifier. The quantifier in this case is ? which is the same as the {0,1} quantifier. To get non-greedy optional matches you must say pattern??, that is pattern, quantifier (in this case ?), and then non-greedy ?.Dunstan
It is right there in the freaking perldoc I quoted, forth line from the bottom!Dunstan
you're confusing ?? (which means "0 or 1" ) with ? (which means "zero or more"). ?? does not mean "non-greedy", it means "zero or one"Harriette
@edg: No, x?? means match x zero or one times but non-greedily, just as x? means match x zero or one times but greedily.Thanet
The first table in the quote is the quantifiers. ? means match 0 or 1 of the pattern the preceded it. The second table is the first table made non-greedy by the addition of a ? to the quantifier. In order to get a 0 or 1 match that is non-greedy you must say ?? or {0,1}?.Dunstan
The questioner was confused by the fact that a match occurred at all because he/she thought that pattern? was non-greedy, I stated that pattern? is greedy and if you want it be non-greedy you must say pattern??, you proceed to disagree with me, and then state my point yourself.Dunstan
Oh, wait, that was simonn, nevermind, you seem to still be confused.Dunstan
@Chris Lutz, that is the way it is formatted in the perldoc, is there a reason you feel having everything flush against the side is better?Dunstan
I honestly don't get it, what is there to be greedy in /'?/ ? ? Match 1 or 0 times ?? Match 0 or 1 time Seems the same to me... The greedy/non-greedy comes only when you have things like * or + that can match any number of times.Kugler
I don't know how to format newlines in comments, but let me try: I honestly don't get it, what is there to be greedy in /'?/ ? ? Match 1 or 0 times ?? Match 0 or 1 time Seems the same to me. The greedy/non-greedy comes only when you have things like * or + that can match any number of times.Kugler
@sundar: The ? is a shorthand for {0,1}. The ?? is a shorthand for {0,1}?. The former matches, if it can (=> geedy), the latter matches if it must (=> non-greedy).Repentance
@sundar, pattern? will match the pattern if it can, pattern?? will match the pattern only if it is necessary for the match to be successful. I will add an example to the answer.Dunstan
I have difficulty coming up with a good example that demonstrates the usefulness of ??, I don't tend to use it.Dunstan
R
3

I think you mean something like:

/'(.*?)'/      // matches everything in single quotes

or

/'[^']*'/      // matches everything in single quotes, but faster

The singe quotes don't need to be escaped, AFAIK.

Repentance answered 3/4, 2009 at 7:57 Comment(0)
L
1

Beware of making all elements of your regex optional (i.e. having all elements quantified with * or ? ). This lets the Perl regex engine match as much as it wants (even nothing), while still considering the match successful.

I suspect what you want is

/'(.*?)'/
Lofton answered 3/4, 2009 at 7:58 Comment(0)
A
1

I would say the closest answer to what you are looking for is

/'?([^']*)'?/

So "get the single quote if it's there", "get anything and everything that's not a single quote", "get the last single quote if it's there".

Unless you want to match "'don't do this'" - but who uses an apostrophe in a single quote anyway (and gets away with it for long)? :)

Ahern answered 16/4, 2009 at 19:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.