How to pin a Raku Grammar token to only match when at the end of a string
Asked Answered
S

1

8

I have written this - it works fine:

use Grammar::Tracer;

my grammar Lambda {
    token  TOP       { <signature> <body> ' as ' <r-type> }
    rule  signature { '|' <a-sig> [',' <b-sig>]? '|' }
    rule  a-sig     { 'a:' <a-type> }
    rule  b-sig     { 'b:' <b-type> }
    token body      { '(' <expr> ')' <?before ' as '> }
    token expr      { <-[()]>* }
    token a-type    { @types }
    token b-type    { @types }
    token r-type    { @types }
}

Lambda.parse("|a: i32, b: i32| (a + b) as i32");

gives what I need:

TOP
|  signature
|  |  a-sig
|  |  |  a-type
|  |  |  * MATCH "i32"
|  |  * MATCH "a: i32"
|  |  b-sig
|  |  |  b-type
|  |  |  * MATCH "i32"
|  |  * MATCH "b: i32"
|  * MATCH "|a: i32, b: i32| "
|  body
|  |  expr
|  |  * MATCH "a + b"
|  * MATCH "(a + b)"
|  r-type
|  * MATCH "i32"
* MATCH "|a: i32, b: i32| (a + b) as i32"

BUT I would like to do this string (and similar): |a: str, b: i32| (a.len() as i32 + b) as i32

  • this fails since it exit the body match on the len() parens
  • even when I fix that it exits on the first as i32

I would like to find some way to "pin" the match to be the last valid match for 'as type' before the end of the string

And how to match but not capture only the other parens.

please

Shitty answered 15/8, 2023 at 16:7 Comment(1)
Does Raku's "tilde/nested" regex operator solve your problem? '(' ~ ')' <expr> for example? See discussion here: docs.raku.org/language/regexes#Tilde_for_nesting_structures . I can write this up as a full answer if useful. – Offshore
S
7

After some trial and error, I managed to work this out (Grammar::Tracer is soooo helpful!)

Here's the working Grammar

my @types  = <bool i32 i64 u32 u64 f32 f64 str>;

my grammar Lambda {
    rule  TOP       { <signature> <body> <as-type> }
    rule  signature { '|' <a-sig> [',' <b-sig>]? '|' }
    rule  a-sig     { 'a:' <a-type> }
    rule  b-sig     { 'b:' <b-type> }
    rule  as-type   { 'as' <r-type> }
    rule  body      { '(' <expr> ')' <?before <as-type>> }
    rule  expr      { .* <?before ')'> }
    token a-type    { @types }
    token b-type    { @types }
    token r-type    { @types }
}

The changes I made were:

  • swap a bunch of tokens to rules (best way to ignore whitespace)
  • <as-type> to bundle the return type as a single matcher in TOP so that it always matches at the end
  • <body> has a lookahead assertion so is always before an <as-type>
  • <expr> has a lookahead assertion so is always before an ')'
  • but otherwise greedy with .* so that it hoovers up the whole expr and does not stop on the first ')'
Shitty answered 16/8, 2023 at 11:18 Comment(8)
sorry raiph - I have added the @types declaration into this answer – Shitty
Ohhhh. D'oh. str. πŸ€¦β€β™‚οΈ Thx. Eyesight ain't what it used to be... πŸ‘“ πŸ˜„ – Hallow
Your answer is spot on¹ but I decided to try nail down a much smaller change to your original grammar in your question that successfully parses your hitherto failing example. I've found that just changing the expr token from { <-[()]>* } to { :s .* <before ')'> } works. This feels like a variant of a loose end I've not yet tidied up. If I decide it is I may add another answer here. §§§ ¹ At minimum, your answer is tidier and matches white space more flexibly. – Hallow
raiph that's cool - my thoughts are (i) , so token { :s .* <before ')'> } is identical to rule { .* <before ')'> }, right? – Shitty
and (ii) there's a subtle difference between some regex inside a token and the same regex definition text just in place in the token that consumes it. In Prinzip: token body { '(' <expr> ')' <?before ' as '> }; token expr { <-[()]>* } vs. token body { '(' <-[()]>* ')' <?before ' as '> } (I am recalling a time of debugging and this is not a good MRE) – Shitty
... so I am somewhat disoriented when I refactor some regex code to externalise it and package into a sub regex... would be helpful to have some canonical example(s) of this – Shitty
"(i) so token { :s .* <before ')'> } is identical to rule { .* <before ')'> }, right?" I firmly believe so. In fact I originally golf'd it to the rule and then decided to mechanically translate it (to the token with an :s at the start). – Hallow
"(ii) there's a subtle difference ..." token body { '(' <expr> ')' <?before ' as '> }; token expr { <-[()]>* } vs. token body { '(' <-[()]>* ')' <?before ' as '> }" When I plug that change into your original grammar it makes no observable difference in the end result -- it works for the example that already worked and doesn't for the one that already didn't. That said, I haven't run it with Grammar::Tracer or Comma to try spot a stepping difference. – Hallow

© 2022 - 2024 β€” McMap. All rights reserved.