How to pin a Raku Grammar token to only match when at the end of a string - McMap

About

How to pin a Raku Grammar token to only match when at the end of a string

Asked 15/8, 2023 at 16:7 Answered 16/8, 2023 at 11:18

Solved grammar raku

S

1

8

I have written this - it works fine:

use Grammar::Tracer;

my grammar Lambda {
    token  TOP       { <signature> <body> ' as ' <r-type> }
    rule  signature { '|' <a-sig> [',' <b-sig>]? '|' }
    rule  a-sig     { 'a:' <a-type> }
    rule  b-sig     { 'b:' <b-type> }
    token body      { '(' <expr> ')' <?before ' as '> }
    token expr      { <-[()]>* }
    token a-type    { @types }
    token b-type    { @types }
    token r-type    { @types }
}

Lambda.parse("|a: i32, b: i32| (a + b) as i32");

gives what I need:

TOP
|  signature
|  |  a-sig
|  |  |  a-type
|  |  |  * MATCH "i32"
|  |  * MATCH "a: i32"
|  |  b-sig
|  |  |  b-type
|  |  |  * MATCH "i32"
|  |  * MATCH "b: i32"
|  * MATCH "|a: i32, b: i32| "
|  body
|  |  expr
|  |  * MATCH "a + b"
|  * MATCH "(a + b)"
|  r-type
|  * MATCH "i32"
* MATCH "|a: i32, b: i32| (a + b) as i32"

BUT I would like to do this string (and similar): |a: str, b: i32| (a.len() as i32 + b) as i32

this fails since it exit the body match on the len() parens
even when I fix that it exits on the first as i32

I would like to find some way to "pin" the match to be the last valid match for 'as type' before the end of the string

And how to match but not capture only the other parens.

please

Shitty answered 15/8, 2023 at 16:7 Comment(1)

Does Raku's "tilde/nested" regex operator solve your problem? '(' ~ ')' <expr> for example? See discussion here: docs.raku.org/language/regexes#Tilde_for_nesting_structures . I can write this up as a full answer if useful. – Offshore 22/8, 2023 at 15:42

S

7

After some trial and error, I managed to work this out (Grammar::Tracer is soooo helpful!)

Here's the working Grammar

my @types  = <bool i32 i64 u32 u64 f32 f64 str>;

my grammar Lambda {
    rule  TOP       { <signature> <body> <as-type> }
    rule  signature { '|' <a-sig> [',' <b-sig>]? '|' }
    rule  a-sig     { 'a:' <a-type> }
    rule  b-sig     { 'b:' <b-type> }
    rule  as-type   { 'as' <r-type> }
    rule  body      { '(' <expr> ')' <?before <as-type>> }
    rule  expr      { .* <?before ')'> }
    token a-type    { @types }
    token b-type    { @types }
    token r-type    { @types }
}

The changes I made were:

swap a bunch of tokens to rules (best way to ignore whitespace)
<as-type> to bundle the return type as a single matcher in TOP so that it always matches at the end
<body> has a lookahead assertion so is always before an <as-type>
<expr> has a lookahead assertion so is always before an ')'
but otherwise greedy with .* so that it hoovers up the whole expr and does not stop on the first ')'

Shitty answered 16/8, 2023 at 11:18 Comment(8)

sorry raiph - I have added the @types declaration into this answer – Shitty 17/8, 2023 at 19:7

Ohhhh. D'oh. str. 🤦‍♂️ Thx. Eyesight ain't what it used to be... 👓 😄 – Hallow 17/8, 2023 at 20:44

Your answer is spot on¹ but I decided to try nail down a much smaller change to your original grammar in your question that successfully parses your hitherto failing example. I've found that just changing the expr token from { <-[()]>* } to { :s .* <before ')'> } works. This feels like a variant of a loose end I've not yet tidied up. If I decide it is I may add another answer here. §§§ ¹ At minimum, your answer is tidier and matches white space more flexibly. – Hallow 17/8, 2023 at 23:4

raiph that's cool - my thoughts are (i) , so token { :s .* <before ')'> } is identical to rule { .* <before ')'> }, right? – Shitty 18/8, 2023 at 21:25

and (ii) there's a subtle difference between some regex inside a token and the same regex definition text just in place in the token that consumes it. In Prinzip: token body { '(' <expr> ')' <?before ' as '> }; token expr { <-[()]>* } vs. token body { '(' <-[()]>* ')' <?before ' as '> } (I am recalling a time of debugging and this is not a good MRE) – Shitty 18/8, 2023 at 21:33

... so I am somewhat disoriented when I refactor some regex code to externalise it and package into a sub regex... would be helpful to have some canonical example(s) of this – Shitty 18/8, 2023 at 21:37

"(i) so token { :s .* <before ')'> } is identical to rule { .* <before ')'> }, right?" I firmly believe so. In fact I originally golf'd it to the rule and then decided to mechanically translate it (to the token with an :s at the start). – Hallow 19/8, 2023 at 10:52

"(ii) there's a subtle difference ..." token body { '(' <expr> ')' <?before ' as '> }; token expr { <-[()]>* } vs. token body { '(' <-[()]>* ')' <?before ' as '> }" When I plug that change into your original grammar it makes no observable difference in the end result -- it works for the example that already worked and doesn't for the one that already didn't. That said, I haven't run it with Grammar::Tracer or Comma to try spot a stepping difference. – Hallow 19/8, 2023 at 12:1

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.