Marpa: Can I explicitly disallow keywords as identifiers?
Asked Answered
R

2

6

I'm implementing a new DSL in Marpa and (coming from Regexp::Grammars) I'm more than satisfied. My language supports a bunch of unary and binary operators, objects with C-style identifiers and method calls using the familiar dot notation. For example:

foo.has(bar == 42 AND baz == 23)

I found the prioritized rules feature offered by Marpa's grammar description language and have come to rely on that a lot, so I have nearly only one G1 rule Expression. Excerpt (many alternatives, and semantic actions omitted for brevity):

Expression ::=
      NumLiteral
    | '(' Expression ')'             assoc => group
   || Expression ('.') Identifier
   || Expression ('.') Identifier Args
    | Expression ('==') Expression
   || Expression ('AND') Expression

Args     ::= ('(') ArgsList (')')
ArgsList ::= Expression+             separator => [,]

Identifier         ~ IdentifierHeadChar IdentifierBody
IdentifierBody     ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]

NumLiteral ~ [0-9]+

As you can see, I'm using the Scanless interface (SLIF). My problem is that this also parses, for example:

foo.AND(5)

Marpa knows that there can only be an identifier after a dot, so it doesn't even consider the fact that AND might be a keyword. I know that I can avoid that problem by doing a separate lexing stage that identifies AND as a keyword explicitly, but that tiny papercut is not quite worth the effort.

Is there a way in SLIF to restrict the Identifier rule to non-keyword identifiers only?

Roos answered 24/11, 2014 at 16:47 Comment(3)
What do you mean by "keyword"? assoc and separator are keywords in the Marpa lingo.Sadyesaechao
@choroba, He means that if he defines an operator AND, he doesn't want it to be allowed as an identifier.Sob
I haven't tested this, but you might want to look at the 'latm' adverb. This allows you to turn off Marpa's knowledge of what lexeme is acceptable where, on a per-lexeme basis -- in effect making it "stupid" for that one lexeme, so that it will think that an 'AND' is OK, and then fail the parse, as you want. Couple this perhaps with a higher lexeme priority so that 'AND' as an operator is preferred over 'AND' as an identifier. Off the top of my head, but hope it helps.Betel
S
2

I don't know how to express such a thing in the grammar. You can introduce an intermediate non-terminal for Identifier which would check the condition, though:

#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };

use Marpa::R2;

my %reserved = map { $_ => 1 } qw( AND );

my $grammar = 'Marpa::R2::Scanless::G'->new(
    { bless_package => 'main',
      source => \( << '__GRAMMAR__'),

:default ::= action => store

:start ::= S
S ::= Id
  | Id NumLiteral
Id ::= Identifier action => allowed

Identifier         ~ IdentifierHeadChar IdentifierBody
IdentifierBody     ~ IdentifierBodyChar*
IdentifierHeadChar ~ [a-zA-Z_]
IdentifierBodyChar ~ [a-zA-Z0-9_]

NumLiteral ~ [0-9]+

:discard ~ whitespace
whitespace ~ [\s]+

__GRAMMAR__
    });

for my $value ('ABC', 'ABC 42', 'AND 1') {
    my $value = $grammar->parse(\$value, 'main');
    print $$value, "\n";
}


sub store {
    my (undef, $id, $arg) = @_;
    $arg //= 'null';
    return "$id $arg";
}

sub allowed {
    my (undef, $id) = @_;
    die "Reserved keyword $id" if $reserved{$id};
    return $id
}
Sadyesaechao answered 24/11, 2014 at 17:39 Comment(4)
oops, doh! [will self-destruct]Sob
I tried various permutations of the priority and latm lexeme adverbs, but this is the only thing that actually made my unit test green. The error messages might not be as pretty, but at least it accepts the grammar correctly.Roos
@StefanMajewsky: A "negative rule" would be nice to have. Thanks for an interesting question.Sadyesaechao
There might be more to come where that one came from. ;) I might look into building an autocompleting code editor for this DSL.Roos
R
2

You can use lexeme priorities intended just for such kind of thing, the example is here in Marpa::R2 test suite.

Basically, you declare <AND keyword> ~ 'AND' lexeme and give it priority 1 so that it's preferred over Identifier. That must do the trick.

P.S. I modified the above script slightly to give an example — code, output.

Rewarding answered 25/11, 2014 at 6:35 Comment(3)
Actions can be used, but that is not terribly efficient -- actions are called at the evaluation phase when the input is read. Events are better, see this gist -- gist.github.com/rns/d19b40ffc5523659dec9 -- AND identifier is rejected once it is met in the input.Rewarding
re $r->literal() -- yes, it can be used to access any input span, if you needed it.Rewarding
Using events to issue a warning asking for spaces around - in 12 34-56 78 needs more work --gist.github.com/rns/962fdb4f30d0681cc07d -- it uses significant spaces (no :discard), marker symbols and nulled events, see metacpan.org/pod/distribution/Marpa-R2/pod/Event.podRewarding

© 2022 - 2024 — McMap. All rights reserved.