In Perl, is there a limit to the number of capture groups in a Regular Expression?
Asked Answered
P

3

5

Is there a limit to the number of capture groups in a regular expression? I used to think it was 9 ($1 ... $9), but haven't found anything in the perlre docs to confirm this. And in fact, the following code shows that there are at least 26.

#!/usr/local/bin/perl

use strict;
use warnings;

my $line = " a b c d e f g h i j k l m n o p q r s t u v w x y z ";

my $lp = "(\\w) ";
my $pat = "";
for (my $i=0; $i<26; $i++)
{
   $pat = $pat . $lp;
}

$line =~ /$pat/;
print "$1 $2 $3 $24 $25 $26\n";

Note that this question: How many captured groups are supported by pcre2 substitute function only refers to the PCRE2 C library. I'm asking about Perl.

Powerdive answered 14/12, 2022 at 15:29 Comment(1)
Ask about the specific languages you are concerned about or just ask about Perl itself. "supports Perl-compatible regular expressions" could mean a lot of thingsKaleighkalends
K
7

https://perldoc.perl.org/perlre says:

There is no limit to the number of captured substrings that you may use.

Kaleighkalends answered 14/12, 2022 at 16:5 Comment(5)
That quote says how many captures can be referenced by backreferences, and not the number of captures a pattern can have. That's probably unlimited too, but the quoted passage does not say that.Tanbark
@Tanbark disagree. that is the third sentence in a paragraph titled "Capture groups". The first sentence describes how parens form capture groups. The second talks about backreferences. The third does follow that, but isn't about just backreferences. Even if you were correct, if you can use backreferences to unlimited groups, it follows that you can capture unlimited groups.Kaleighkalends
The actual quote is "This is called a backreference. There is no limit to the number of captured substrings that you may use. Groups are numbered with the leftmost open parenthesis being number 1, etc." The sentence before is about backreferences. The sentence after is about backreferences. The entire paragraphs is all about backreferences.Tanbark
And if you re-read the quoted sentence, you'll notice it doesn't say "there is no limit to the number of captures" as you seem to think it does. It actually states that there's no limit on how many of the strings that were captured can be used (by backreferences). So both by the literal wording of the quote and the context of the quote, it pertains to backreferences, and it doesn't answer the question.Tanbark
I did miss a sentence there, thanks. Still disagree.Kaleighkalends
D
5

Why not just test it. Regexp with 20 million captures which ought to be enough for anybody. Makes me think memory is the limit here. This took 25 seconds on my old laptop with perl v5.30:

my $n = 20_000_000;                 # 20 million
my $re = join"", map "(.)", 1..$n;  # create regexp with 20 million captures
my $str = "ABC" x $n;               # create a more than long enough string
$str =~ /$re/;                      # match & capture
print $19999987, "\n";              # print the "A" in capture var number 19999987
print ${^CAPTURE}[19999987-1],"\n"; # same
print "Length: ".@{^CAPTURE}."\n";  # prints 20000000, length of array
Dehiscent answered 14/12, 2022 at 22:29 Comment(2)
I'm curious what hardware you have, etc. How high can you get it?Generate
@briandfoy – I got to 200 million on a 32G ram Intel i7-7700 3.6ghz perl5.34. This took 3 minutes. Trial of 250 million started eating swap so I killed it.Dehiscent
G
3

You can just try it! Even if there is no built-in limit, there's probably a practical one.

Let's try in on my M1 Mac Mini with Perl v5.36.

Here's a little program to take a number of captures I want, then builds a string long enough to match that and a pattern with that number of captures (check out that use of the v5.36 builtin::ceil):

#!perl

use v5.36;
use experimental qw(builtin);
use builtin qw(ceil);

my $n = shift;
say "N is $n";

my $alpha = join '', 'a' .. 'z';
my $multiple = ceil($n / 26);
my $text = $alpha x ($multiple + 1);

my $n_mod_26 = $n % 26;
my $expected_letter = substr $alpha, $n_mod_26 - 1, 1;

my $pattern_text = '(.)' x $n;
my $pattern = qr/$pattern_text/;

my $result = $text =~ $pattern;
say $result ? "Matched" : 'Did not match';

no strict 'refs';
my $matched = do { no strict 'refs'; ${"$n"} };
print "Matched <$matched>; expected <$expected_letter>\n";

When I run this for varying lengths, I eventually get the shell to give up:

brian@M1-Mini Desktop % for i in 1 3 7 50 500 5000 70000 900000 3000000 40000000 1234567890; do echo '----' && time perl test.pl $i; done
----
N is 1
Matched
Matched <a>; expected <a>
perl test.pl $i  0.02s user 0.01s system 67% cpu 0.047 total
----
N is 3
Matched
Matched <c>; expected <c>
perl test.pl $i  0.01s user 0.00s system 91% cpu 0.014 total
----
N is 7
Matched
Matched <g>; expected <g>
perl test.pl $i  0.01s user 0.00s system 92% cpu 0.011 total
----
N is 50
Matched
Matched <x>; expected <x>
perl test.pl $i  0.01s user 0.00s system 92% cpu 0.010 total
----
N is 500
Matched
Matched <f>; expected <f>
perl test.pl $i  0.01s user 0.00s system 92% cpu 0.008 total
----
N is 5000
Matched
Matched <h>; expected <h>
perl test.pl $i  0.01s user 0.00s system 93% cpu 0.008 total
----
N is 70000
Matched
Matched <h>; expected <h>
perl test.pl $i  0.02s user 0.00s system 97% cpu 0.022 total
----
N is 900000
Matched
Matched <j>; expected <j>
perl test.pl $i  0.20s user 0.02s system 97% cpu 0.229 total
----
N is 3000000
Matched
Matched <p>; expected <p>
perl test.pl $i  0.69s user 0.06s system 95% cpu 0.786 total
----
N is 40000000
Matched
Matched <n>; expected <n>
perl test.pl $i  9.32s user 1.08s system 91% cpu 11.402 total
----
N is 1234567890
zsh: killed     perl test.pl $i
perl test.pl $i  127.80s user 6.17s system 83% cpu 2:39.69 total

My machine gives up with 1,234,567,890 groups. That might have nothing to do with the number of groups; maybe something else in perl decided it was unhappy, or maybe the program went past some process resource limit. Your own machine may give up at a different point (or not give up at all). I have no idea what killed it, and I don't really care because even if I knew, I'm not going to do anything to fix that.

But, can I find the maximum number? It's somewhere around 389,000,000 captures. It's not a set number that I can consistently predict and probably depends on other, unrelated things going on at the same time.

Generate answered 14/12, 2022 at 22:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.