Perl's capture group disappears while in scope
Asked Answered
M

1

5

I have very simple code that parses a file name:

#!/usr/bin/env perl

use 5.040;
use warnings FATAL => 'all';
use autodie ':default';

my $string = '/home/con/bio.data/blastdb/phytophthora.infestans.KR_2_A2/GCA_012552325.1.protein.faa';

if ($string =~ m/blastdb\/(\w)\w+\.([\w\.]+)/) {
    my $rest = $2; # $1 would be valid here
    $rest =~ s/\./ /g;
    my $name = "$1.$rest"; # $1 disappears here
}

the above code fails with Use of uninitialized value $1 in concatenation (.) or string

However, if I save $1 into a variable, e.g. $g, the information isn't lost.

if ($string =~ m/blastdb\/(\w)\w+\.([\w\.]+)/) {
    my ($g, $rest) = ($1, $2);
    $rest =~ s/\./ /g;
    my $name = "$g.$rest";
}

So I can fix this.

However, $1 shouldn't just disappear like that, shouldn't $1 remain valid while in scope? Is this a bug in Perl? or is there some rule in https://perldoc.perl.org/perlretut that I missed?

Miterwort answered 2/7, 2024 at 1:7 Comment(1)
Each successful match resets the match variables. See here More confusing is that the variables are only reset of the match matches -- the source of many a bite...Blasius
C
7

$rest =~ s/\./ /g; does a regex match. The pattern it matches (/\./) doesn't have any capturing groups, therefore all of the capture variables are uninitialized after it completes.

You can save what you need in variables — most simply, by doing if (my ($g, $rest) = $string =~ /yadda yadda/) or you can avoid doing other regex matches before you're done with the captures from the previous one — in this case, $rest =~ tr/./ / would do the job just as well as $rest =~ s/\./ /g, but without clobbering the capture variables.

Curarize answered 2/7, 2024 at 1:13 Comment(3)
And, if the s/// did have capturing groups, the capture variables would be reset to whatever they captured.Easternmost
@briandfoy I feel that Perl would be better if s/// didn't overwrite capture groups when s/// doesn't use any captures, that's what threw me offMiterwort
So, you think that after a successful match that some of the match variables should have invalid and out-of-date values? How would you know which ones were valid? How would the work when you didn't know the pattern at compile-time?Easternmost

© 2022 - 2025 — McMap. All rights reserved.