Consider the grammar:
TOP ⩴ 'x' Y 'z'
Y ⩴ 'y'
Here's how to get the exact value ["TOP","x",["Y","y"],"z"]
with various parsers (not written manually, but generated from the grammar):
xyz__Parse-Eyapp.eyp
%strict
%tree
%%
start:
TOP { shift; use JSON::MaybeXS qw(encode_json); print encode_json $_[0] };
TOP:
'x' Y 'z' { shift; ['TOP', (scalar @_) ? @_ : undef] };
Y:
'y' { shift; ['Y', (scalar @_) ? @_ : undef] };
%%
xyz__Regexp-Grammars.pl
use 5.028;
use strictures;
use Regexp::Grammars;
use JSON::MaybeXS qw(encode_json);
print encode_json $/{TOP} if (do { local $/; readline; }) =~ qr{
<nocontext:>
<TOP>
<rule: TOP>
<[anon=(x)]> <[anon=Y]> <[anon=(z)]>
<MATCH=(?{['TOP', $MATCH{anon} ? $MATCH{anon}->@* : undef]})>
<rule: Y>
<[anon=(y)]>
<MATCH=(?{['Y', $MATCH{anon} ? $MATCH{anon}->@* : undef]})>
}msx;
Code elided for the next two parsers. With Pegex, the functionality is achieved by inheriting from Pegex::Receiver. With Marpa-R2, the customisation of the return value is quite limited, but nested arrays are possible out of the box with a configuration option.
I have demonstrated that the desired customisation is possible, although it's not always easy or straight-forward. These pieces of code attached to the rules are run as the tree is assembled.
The parse method returns nothing but nested Match objects that are unwieldy. They do not retain the unnamed terminals! (Just to make sure what I'm talking about: these are the two pieces of data at the RHS of the TOP
rule whose values are 'x'
and 'z'
.) Apparently only data springing forth from named declarators are added to the tree.
Assigning to the match variable (analog to how it works in Regexp-Grammars) seems to have no effect. Since the terminals do no make it into the match variable, actions don't help, either.
In summary, here's the grammar and ordinary parse value:
grammar {rule TOP { x <Y> z }; rule Y { y };}.parse('x y z')
How do you get the value ["TOP","x",["Y","y"],"z"]
from it? You are not allowed to change the shape of rules because that would potentially spoil user attached semantics, otherwise anything else is fair game. I still think the key to the solution is the match variable, but I can't see how.