Detailed discussion complementing Christoph's answer
I am trying to parse a csv file
Perhaps you are focused on learning Raku parsing and are writing some throwaway code. But if you want industrial strength CSV parsing out of the box, please be aware of the Text::CSV modules[1].
I am trying to access a named regex
If you are learning Raku parsing, please take advantage of the awesome related (free) developer tools[2].
in proto regex in Raku
Your issue is unrelated to it being a proto regex.
Instead the issue is that, while the match object corresponding to your named capture is stored in the overall match object you stored in $m1
, it is not stored precisely where you are looking for it.
Where do match objects corresponding to captures appear?
To see what's going on, I'll start by simulating what you were trying to do. I'll use a regex that declares just one capture, a "named" (aka "Associative") capture that matches the string ab
.
given 'ab'
{
my $m1 = m/ $<named-capture> = ( ab ) /;
say $m1<named-capture>;
# 「ab」
}
The match object corresponding to the named capture is stored where you'd presumably expect it to appear within $m1
, at $m1<named-capture>
.
But you were getting Nil with $m1<oneCSV>
. What gives?
Why your $m1<oneCSV>
did not work
There are two types of capture: named (aka "Associative") and numbered (aka "Positional"). The parens you wrote in your regex that surrounded <oneCSV>
introduced a numbered capture:
given 'ab'
{
my $m1 = m/ ( $<named-capture> = ( ab ) ) /; # extra parens added
say $m1[0]<named-capture>;
# 「ab」
}
The parens in / ( ... ) /
declare a single top level numbered capture. If it matches, then the corresponding match object is stored in $m1[0]
. (If your regex looked like / ... ( ... ) ... ( ... ) ... ( ... ) ... /
then another match object corresponding to what matches the second pair of parentheses would be stored in $m1[1]
, another in $m1[2]
for the third, and so on.)
The match result for $<named-capture> = ( ab )
is then stored inside $m1[0]
. That's why say $m1[0]<named-capture>
works.
So far so good. But this is only half the story...
Why $m1[0]<oneCSV>
in your code would not work either
While $m1[0]<named-capture>
in the immediately above code is working, you would still not get a match object in $m1[0]<oneCSV>
in your original code. This is because you also asked for multiple matches of the zeroth capture because you used a *
quantifier:
given 'ab'
{
my $m1 = m/ ( $<named-capture> = ( ab ) )* /; # * is a quantifier
say $m1[0][0]<named-capture>;
# 「ab」
}
Because the *
quantifier asks for multiple matches, Raku writes a list of match objects into $m1[0]
. (In this case there's only one such match so you end up with a list of length 1, i.e. just $m1[0][0]
(and not $m1[0][1]
, $m1[0][2]
, etc.).)
Summary
Captures nest;
A capture quantified by either *
or +
corresponds to two levels of nesting not just one.
In your original code, you'd have to write say $m1[0][0]<oneCSV>;
to get to the match object you're looking for.
[1] Install relevant modules and write use Text::CSV;
(for a pure Raku implementation) or use Text::CSV:from<Perl5>;
(for a Perl plus XS implementation) at the start of your code. (talk slides (click on top word, eg. "csv", to advance through slides), video, Raku module, Perl XS module.)
[2] Install CommaIDE and have fun with its awesome grammar/regex development/debugging/analysis features. Or install the Grammar::Tracer;
and/or Grammar::Debugger
modules and write use Grammar::Tracer;
or use Grammar::Debugger;
at the start of your code (talk slides, video, modules.)