How to commit to an alternation branch in a Raku grammar token?
Asked Answered
B

1

9

Suppose I have a grammar with the following tokens

token paragraph {
    (
        || <header>
        || <regular>
    )

    \n
}

token header { ^^ '---' '+'**1..5 ' ' \N+ }
token regular { \N+ }

The problem is that a line starting with ---++Foo will be parsed as a regular paragraph because there is no space before "Foo". I'd like to fail the parse in this case, i.e. somehow "commit" to this branch of the alternation, e.g. after seeing --- I want to either parse the header successfully or fail the match completely.

How can I do this? The only way I see is to use a negative lookahead assertion before <regular> to check that it does not start with ---, but this looks rather ugly and impractical, considering that my actual grammar has many more than just these 2 branches. Is there some better way? Thanks in advance!

Blaspheme answered 26/7, 2022 at 1:45 Comment(6)
Maybe make the whitespace (' ') optional, with \s? or similar. Does that work?Crocus
Or you define badheader without space and change regular not to match badheader, too.Gers
Making whitespace optional would "fix" this, but would be wrong (as ---+foo isn't actually recognized as a header in the markup I'm parsing) and in my real grammar I have other constructs that are uniquely identified by their prefix, but have still have to conform to some shape afterwards. Handling them as regular (i.e. free-form) paragraphs will hide errors in either the input or in my grammar and I'd rather detect them, i.e. make the grammar fail completely instead.Blaspheme
Does arbitrary code embedded in a token work? (docs.raku.org/language/grammars#Dynamic_variables_in_grammars). You could set an error variable there or with match objects you could call code (docs.raku.org/language/grammars#Action_objects). So your original header would match, but at the same time signify that it is in a wrong format.Gers
Thanks @Sebastian, dynamic variables look promising, so I'll try this. As for the actions, I already use them, but the corresponding action method is not called if the grammar rule doesn;t match, so I don't see how it could help here.Blaspheme
You could let the header rule without spaces match and only have the action called, if it is malformed. This way you can either manage those failures externally to the overall matching or - not sure if this works - test with a token at the end of your grammar that never any failure action was called.Gers
A
5

If I understood your question correctly, you could do something like this:

token header {
    ^^ '---' [
                 || '+'**1..5 ' ' \N+
                 || { die "match failed near position $/.pos()" }
             ]
}
Alienable answered 31/7, 2022 at 9:1 Comment(3)
Could this be combined with catched exceptions (docs.raku.org/language/exceptions)?Gers
I think so. Depends on where you want to catch it, probably.Alienable
Excellent, thank you, this does exactly what I wanted and in a very simple way, too, great!Blaspheme

© 2022 - 2024 — McMap. All rights reserved.