There are two possibilities for handling mistakes in Marpa.
“Ruby Slippers” Parsing
Marpa maintains a lot of context during scanning. We can use this context so that the parser can require some token, and we can decide whether we want to offer it to Marpa even if it isn't in the input. Consider for example a programming language that requires any statement to be terminated by a semicolon. We can then use Ruby Slippers techniques to insert semicolons at specific locations, such as at the end of a line, or before a closing brace:
use strict;
use warnings;
use Marpa::R2;
use Data::Dump 'dd';
my $grammar = Marpa::R2::Scanless::G->new({
source => \q{
:discard ~ ws
Block ::= Statement+ action => ::array
Statement ::= StatementBody (STATEMENT_TERMINATOR) action => ::first
StatementBody ::= 'statement' action => ::first
| ('{') Block ('}') action => ::first
STATEMENT_TERMINATOR ~ ';'
event ruby_slippers = predicted STATEMENT_TERMINATOR
ws ~ [\s]+
},
});
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
my $input = q(
statement;
{ statement }
statement
statement
);
for (
$recce->read(\$input);
$recce->pos < length $input;
$recce->resume
) {
ruby_slippers($recce, \$input);
}
ruby_slippers($recce, \$input);
dd $recce->value;
sub ruby_slippers {
my ($recce, $input) = @_;
my %possible_tokens_by_length;
my @expected = @{ $recce->terminals_expected };
for my $token (@expected) {
pos($$input) = $recce->pos;
if ($token eq 'STATEMENT_TERMINATOR') {
# fudge a terminator at the end of a line, or before a closing brace
if ($$input =~ /\G \s*? (?: $ | [}] )/smxgc) {
push @{ $possible_tokens_by_length{0} }, [STATEMENT_TERMINATOR => ';'];
}
}
}
my $max_length = 0;
for (keys %possible_tokens_by_length) {
$max_length = $_ if $_ > $max_length;
}
if (my $longest_tokens = $possible_tokens_by_length{$max_length}) {
for my $lexeme (@$longest_tokens) {
$recce->lexeme_alternative(@$lexeme);
}
$recce->lexeme_complete($recce->pos, $max_length);
return ruby_slippers($recce, $input);
}
}
In the ruby_slippers
function, you could also count how often you needed to fudge a token. If that count exceeds some value, you could abandon the parse by throwing an error.
Skipping input
If your input may contain unparseable junk, you can try skipping that if no lexeme would otherwise be found. For this, the $recce->resume
method takes an optional position argument, where the normal parsing will resume.
use strict;
use warnings;
use Marpa::R2;
use Data::Dump 'dd';
use Try::Tiny;
my $grammar = Marpa::R2::Scanless::G->new({
source => \q{
:discard ~ ws
Sentence ::= WORD+ action => ::array
WORD ~ 'foo':i | 'bar':i | 'baz':i | 'qux':i
ws ~ [\s]+
},
});
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
my $input = '1) Foo bar: baz and qux, therefore qux (foo!) implies bar.';
try { $recce->read(\$input) };
while ($recce->pos < length $input) {
# ruby_slippers($recce, \$input);
try { $recce->resume } # restart at current position
catch { try { $recce->resume($recce->pos + 1) } }; # advance the position
# if both fail, we go into a new iteration of the loop.
}
dd $recce->value;
While the same effect could be achieved with a :discard
lexeme that matches anything, doing the skipping in our client code allows us to abort the parse if too much fudging had to be done.