There are a few things that cause the capture index to reset. |
and ||
happen to be one.
Putting it inside of another capture group is another. (Because the match result is a tree.)
When Raku was being designed everything was redesigned to be more consistent, more useful, and more powerful. Regexes included.
If you have an alternation something like this:
/ (foo) | (bar) /
You might want to use it like this:
$line ~~ / (foo) | (bar) /;
say %h{ ~$0 };
If the (bar)
was $1
instead, you would have to write it something like this:
$line ~~ / (foo) | (bar) /;
say %h{ ~$0 || ~$1 };
It is generally more useful for the capture group numbering to start again from zero.
This also makes it so that a regex is more like a general purpose programming language. (Each “block” is an independant subexpression.)
Now sometimes it might be nice to renumber the capture groups.
/ ^
[ (..) '-' (..) '-' (....) # mm-dd-yyyy
| (..) '-' (....) # mm-yyyy
]
$ /
Notice that the yyyy
part is either $2
or $1
depending on whether the dd
part is included.
my $day = +$2 ?? $1 !! 1;
my $month = +$0;
my $year = +$2 || +$1;
We can renumber the yyyy
to always be $2
.
/ ^
[ (..) '-' (..) '-' (....) # mm-dd-yyyy
| (..) '-' $2 = (....) # mm-yyyy
]
$ /
my $day = +$1 || 1;
my $month = +$0;
my $year = +$2;
Or what if we need to also accept yyyy-mm-dd
/ ^
[ (..) '-' (..) '-' (....) # mm-dd-yyyy
| (..) '-' $2 = (....) # mm-yyyy
| $2 = (....) '-' $0 = (..) '-' $1 = (..) # yyyy-mm-dd
]
$ /
my $day = +$1 || 1
my $month = +$0;
my $year = +$2;
Actually now that we have a lot of capture groups let's look again how we would handle it if |
didn't cause the numbered capture groups to start again from $0
/ ^
[ (..) '-' (..) '-' (....) # mm-dd-yyyy
| (..) '-' (....) # mm-yyyy
| (....) '-' (..) '-' (..) # yyyy-mm-dd
]
$ /
my $day = +$1 || +$7 || 1;
my $month = +$0 || +$3 || +$6;
my $year = +$2 || +$4 || +$5;
That is not great.
For one thing you have to make sure both the regex and the my $day
match up correctly.
Quick without counting capture groups, make sure that those numbers match the correct capture groups.
Of course that still has the issue that concepts which have a name are instead captured by a number.
So we should use names instead.
/ ^
[ $<month> = (..) '-' $<day> = (..) '-' $<year> = (....) # mm-dd-yyyy
| $<month> = (..) '-' $<year> = (....) # mm-yyyy
| $<year> = (....) '-' $<month> = (..) '-' $<day> = (..) # yyyy-mm-dd
]
$ /
my $day = +$<day> || 1;
my $month = +$<month>;
my $year = +$<year>;
So long story short, I would do this:
/ $<foo> = (foo) | $<bar> = (bar) /;
if $<foo> {
…
} elsif $<bar> {
…
}
|
alternation operator does Longest Token Matching (LTM), not sequential (i.e. "first named") token matching. See: docs.raku.org/language/regexes#Longest_alternation:_| and docs.raku.org/language/… . – Begum