How to commit to an alternation branch in a Raku grammar token?

About

Asked 26/7, 2022 at 1:45 Answered 31/7, 2022 at 9:1

Suppose I have a grammar with the following tokens

token paragraph {
    (
        || <header>
        || <regular>
    )

    \n
}

token header { ^^ '---' '+'**1..5 ' ' \N+ }
token regular { \N+ }

The problem is that a line starting with ---++Foo will be parsed as a regular paragraph because there is no space before "Foo". I'd like to fail the parse in this case, i.e. somehow "commit" to this branch of the alternation, e.g. after seeing --- I want to either parse the header successfully or fail the match completely.

How can I do this? The only way I see is to use a negative lookahead assertion before <regular> to check that it does not start with ---, but this looks rather ugly and impractical, considering that my actual grammar has many more than just these 2 branches. Is there some better way? Thanks in advance!

Blaspheme answered 26/7, 2022 at 1:45 Comment(6)

Maybe make the whitespace (' ') optional, with \s? or similar. Does that work? – Crocus 27/7, 2022 at 4:33

Or you define badheader without space and change regular not to match badheader, too. – Gers 27/7, 2022 at 5:19

Making whitespace optional would "fix" this, but would be wrong (as ---+foo isn't actually recognized as a header in the markup I'm parsing) and in my real grammar I have other constructs that are uniquely identified by their prefix, but have still have to conform to some shape afterwards. Handling them as regular (i.e. free-form) paragraphs will hide errors in either the input or in my grammar and I'd rather detect them, i.e. make the grammar fail completely instead. – Blaspheme 28/7, 2022 at 0:55

Does arbitrary code embedded in a token work? (docs.raku.org/language/grammars#Dynamic_variables_in_grammars). You could set an error variable there or with match objects you could call code (docs.raku.org/language/grammars#Action_objects). So your original header would match, but at the same time signify that it is in a wrong format. – Gers 28/7, 2022 at 14:41

Thanks @Sebastian, dynamic variables look promising, so I'll try this. As for the actions, I already use them, but the corresponding action method is not called if the grammar rule doesn;t match, so I don't see how it could help here. – Blaspheme 28/7, 2022 at 18:9

You could let the header rule without spaces match and only have the action called, if it is malformed. This way you can either manage those failures externally to the overall matching or - not sure if this works - test with a token at the end of your grammar that never any failure action was called. – Gers 29/7, 2022 at 2:27

If I understood your question correctly, you could do something like this:

token header {
    ^^ '---' [
                 || '+'**1..5 ' ' \N+
                 || { die "match failed near position $/.pos()" }
             ]
}

Alienable answered 31/7, 2022 at 9:1 Comment(3)

Could this be combined with catched exceptions (docs.raku.org/language/exceptions)? – Gers 31/7, 2022 at 9:40

I think so. Depends on where you want to catch it, probably. – Alienable 1/8, 2022 at 5:28

Excellent, thank you, this does exactly what I wanted and in a very simple way, too, great! – Blaspheme 1/8, 2022 at 22:28

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags