The <(
and )>
capture markers only work within a given a given token. Basically, each token returns a Match
object that says "I matched the original string from index X (.from
) to index Y (.to
)", which is taken into account when stringifying Match
objects. That's what's happening with your strvalue token:
my $text = 'bar = "Hello, World!"';
my $m = MyGrammar.parse: $text;
my $start = $m<value><strvalue>.from; # 7
my $end = $m<value><strvalue>.to; # 20
say $text.substr: $start, $end - $start; # Hello, World!
You'll notice that there are only two numbers: a start and finish value. This mens that when you look at the value
token you have, it can't create a discontiguous match. So it's .from
is set to 6, and its .to
to 21.
There are two ways around this: by using (a) an actions object or (b) a multitoken. Both have their advantages, and depending on how you want to use this in a larger project, you might want to opt for one or the other.
While you can technically define actions directly within a grammar, it's much easier to do them via a separate class. So we might have for you:
class MyActions {
method TOP ($/) { make $<keyword>.made => $<value>.made }
method keyword ($/) { make ~$/ }
method value ($/) { make ($<numvalue> // $<strvalue>).made }
method numvalue ($/) { make +$/ }
method strvalue ($/) { make ~$/ }
}
Each level make
to pass values up to whatever token includes it. And the enclosing token has access to their values via the .made
method. This is really nice when, instead of working with pure string values, you want to process them first in someway and create an object or similar.
To parse, you just do:
my $m = MyGrammar.parse: $text, :actions(MyActions);
say $m.made; # bar => Hello, World!
Which is actually a Pair
object. You could change the exact result by modifying the TOP
method.
The second way you can work around things is to use a multi token
. It's fairly common in developing grammars to use something akin to
token foo { <option-A> | <option-B> }
But as you can see from the actions class, it requires us to check and see which one was actually matched. Instead, if the alternation can acceptable by done with |
, you can use a multitoken:
proto token foo { * }
multi token:sym<A> { ... }
multi token:sym<B> { ... }
When you use <foo>
in your grammar, it will match either of the two multi versions as if it had been in the baseline <foo>
. Even better, if you're using an actions class, you can similarly just use $<foo>
and know it's there without any conditionals or other checks.
In your case, it would look like this:
grammar MyGrammar
{
rule TOP { <keyword> '=' <value> }
token keyword { \w+ }
proto token value { * }
multi token value:sym<str> { '"' <( <-["]>* )> '"' }
multi token value:sym<num> { '-'? \d+ [ '.' \d* ]? }
}
Now we can access things as you were originally expecting, without using an actions object:
my $text = 'bar = "Hello, World!"';
my $m = MyGrammar.parse: $text;
say $m; # 「bar = "Hello, World!"」
# keyword => 「bar」
# value => 「Hello, World!」
say $m<value>; # 「Hello, World!」
For reference, you can combine both techniques. Here's how I would now write the actions object given the multi token:
class MyActions {
method TOP ($/) { make $<keyword>.made => $<value>.made }
method keyword ($/) { make ~$/ }
method value:sym<str> ($/) { make ~$/ }
method value:sym<num> ($/) { make +$/ }
}
Which is a bit more grokkable at first look.