Stopping Raku grammar at EOS (End of String)
Asked Answered
A

1

10

In the process of writing a translator of one music language to another (ABC to Alda) as an excuse to learn Raku DSL-ability, I noticed that there doesn't seem to be a way to terminate a .parse! Here is my shortened demo code:

#!/home/hsmyers/rakudo741/bin/perl6
use v6d;

# use Grammar::Debugger;
use Grammar::Tracer;

my $test-n01 = q:to/EOS/;
a b c d e f g
A B C D E F G
EOS

grammar test {
  token TOP { <score>+ }
  token score {
      <.ws>?
      [
          | <uc>
          | <lc>
      ]+
      <.ws>?
  }
  token uc { <[A..G]> }
  token lc { <[a..g]> }
}

test.parse($test-n01).say;

And it is the last part of the Grammer::Tracer display that demonstrates my problem.

|  score
|  |  uc
|  |  * MATCH "G"
|  * MATCH "G\n"
|  score
|  * FAIL
* MATCH "a b c d e f g\nA B C D E F G\n"
「a b c d e f g
A B C D E F G
」

On the second to last line, the word FAIL tells me that the .parse run has no way of quitting. I wonder if this is correct? The .say displays everything as it should be, so I'm not clear on how real the FAIL is? The question remains, "How do I correctly write a grammar that parses multiple lines without error?"

Annunciate answered 26/12, 2019 at 1:43 Comment(2)
I don't want to interfere in your learning process, but just in case you weren't aware, there's an ABC module.Kannada
Well, at least we didn't pick the same tunes to test on!Annunciate
K
11

When you use the grammar debugger, it lets you see exactly how the engine is parsing the string — fails are normal and expected. Considered, for example, matching a+b* with the string aab. You should get two matches for 'a', followed by a fail (because b is not a) but then it will retry with b and successfully match.

This might be more easily seen if you do an alternation with || (which enforces order). If you have

token TOP   { I have a <fruit> }
token fruit { apple || orange || kiwi }

and you parse the sentence "I have a kiwi", you'll see it first match "I have a", followed by two fails with "apple" and "orange", and finally a match with "kiwi".

Now let's look at your case:

TOP                  # Trying to match top (need >1 match of score)
|  score             #   Trying to match score (need >1 match of lc/uc)
|  |  lc             #     Trying to match lc
|  |  * MATCH "a"    #     lc had a successful match! ("a")
|  * MATCH "a "      #   and as a result so did score! ("a ")
|  score             #   Trying to match score again (because <score>+)
|  |  lc             #     Trying to match lc 
|  |  * MATCH "b"    #     lc had a successful match! ("b")
|  * MATCH "b "      #   and as a result so did score! ("b ")
……………                #     …so forth and so on until…
|  score             #   Trying to match score again (because <score>+)
|  |  uc             #     Trying to match uc
|  |  * MATCH "G"    #     uc had a successful match! ("G")
|  * MATCH "G\n"     #   and as a result, so did score! ("G\n")
|  score             #   Trying to match *score* again (because <score>+)
|  * FAIL            #   failed to match score, because no lc/uc.
|
|  # <--------------   At this point, the question is, did TOP match?
|  #                     Remember, TOP is <score>+, so we match TOP if there 
|  #                     was at least one <score> token that matched, there was so...
|
* MATCH "a b c d e f g\nA B C D E F G\n" # this is the TOP match

The fail here is normal: at some point we will run out of <score> tokens, so a fail is inevitable. When that happens, the grammar engine can move on to whatever comes after the <score>+ in your grammar. Since there's nothing, that fail actually results in a match of the entire string (because TOP matches with implicit /^…$/).

Also, you might consider rewriting your grammar with a rule which inserts <.ws>* automatically (unless it's important for it to be a single space only):

grammar test {
  rule TOP { <score>+ }
  token score {
      [
          | <uc>
          | <lc>
      ]+
  }
  token uc { <[A..G]> }
  token lc { <[a..g]> }
}

Further, IME, you might want to also want to add a proto token for the uc/lc, because when you have [ <foo> | <bar> ] you will always have one of them be undefined which can make processing them in an actions class a bit annoying. You could try:

grammar test {
  rule  TOP   { <score>  + }
  token score { <letter> + }

  proto token letter    {     *    }
        token letter:uc { <[A..G]> }
        token letter:lc { <[a..g]> }
}

$<letter> will always be defined this way.

Katushka answered 26/12, 2019 at 3:57 Comment(9)
This explains the fact that the match object returned 'so's out as true even with the 'FAIL'. I thought that might be the case; I'll move back to adding necessary tokens for the real project ;)Annunciate
The real grammar doesn't seem to like inserting <.ws>* automatically; probably due to additional layers involved beyond <score>. Your suggestion to use proto looks good as soon as I'm able to wrap my head around the technique…Annunciate
@Annunciate "real grammar doesn't seem to like inserting <.ws>* automatically". Consider reviewing What's the best way to be lax on whitespace in a Raku grammar? and How do I match a hex array in perl6 grammar and When is white space really important in Raku grammars?.Kannada
I hate having code that I don't need—more to debug, and then there's the aesthetics of it all! The actual problem is that ABC doesn't give a damn about spaces. There are some exceptions, but by and large, they can occur almost anywhere. The 'use' case is a matter of legibility somewhat like commas in large digit strings. I will revisit the problem as needed until I understand the problem and have reduced it to a minimum.Annunciate
@Annunciate Having spent a few minutes reading some of the matches of an in-page search for 'space' at The DRAFT abc music notation standard 2.2 (Feb 2013 - present) I quickly came to understand your point. This explains why colomon's ABC grammar makes no use of rules / :sigspace. Instead it explicitly inserts spacing (using \s, \h, and \v) in a handful of regexs and tokens.Kannada
@raiph—The space issue gets worse! If you look at 'beams' then you run into this quote from the standard (2.2) "To group notes together under one beam they must be grouped together without spaces. Thus in 2/4, A2BC will produce an eighth note followed by two sixteenth notes under one beam whilst A2 B C will produce the same notes separated. The beam slopes and the choice of upper or lower stems are typeset automatically." Oops!Annunciate
Fwiw that text is the same in the 2.0 standard. Also, check out this commit to the Raku ABC package from just 13 days ago: Fix beaming on 3/4 tunes. colomon's been working on the ABC toolkit in Raku for 10+ years. They have a very nice looking grammar. I see some commits related to 2.1. Maybe say 'hi'? Maybe pivot to installing the existing ABC module and learning about Raku DSL-ability by being the 8th contributor to the ABC module? Just a thought...Kannada
hsmyers: thankfully understanding proto isn't too hard and once you get the hang of it it makes your life so much easier.Katushka
"When you use the grammar debugger" In case a reader is unaware, there's now also Comma IDE's Grammar Live View debugger.Kannada

© 2022 - 2024 — McMap. All rights reserved.