Stopping Raku grammar at EOS (End of String)

#!/home/hsmyers/rakudo741/bin/perl6 use v6d; # use Grammar::Debugger; use Grammar::Tracer; my $test-n01 = q:to/EOS/; a b c d e f g A B C D E F G EOS grammar test { token TOP { <score>+ } token score { <.ws>? [ | <uc> | <lc> ]+ <.ws>? } token uc { <[A..G]> } token lc { <[a..g]> } } test.parse($test-n01).say;

When you use the grammar debugger, it lets you see exactly how the engine is parsing the string — fails are normal and expected. Considered, for example, matching a+b* with the string aab. You should get two matches for 'a', followed by a fail (because b is not a) but then it will retry with b and successfully match.

This might be more easily seen if you do an alternation with || (which enforces order). If you have

token TOP   { I have a <fruit> }
token fruit { apple || orange || kiwi }

and you parse the sentence "I have a kiwi", you'll see it first match "I have a", followed by two fails with "apple" and "orange", and finally a match with "kiwi".

Now let's look at your case:

TOP                  # Trying to match top (need >1 match of score)
|  score             #   Trying to match score (need >1 match of lc/uc)
|  |  lc             #     Trying to match lc
|  |  * MATCH "a"    #     lc had a successful match! ("a")
|  * MATCH "a "      #   and as a result so did score! ("a ")
|  score             #   Trying to match score again (because <score>+)
|  |  lc             #     Trying to match lc 
|  |  * MATCH "b"    #     lc had a successful match! ("b")
|  * MATCH "b "      #   and as a result so did score! ("b ")
……………                #     …so forth and so on until…
|  score             #   Trying to match score again (because <score>+)
|  |  uc             #     Trying to match uc
|  |  * MATCH "G"    #     uc had a successful match! ("G")
|  * MATCH "G\n"     #   and as a result, so did score! ("G\n")
|  score             #   Trying to match *score* again (because <score>+)
|  * FAIL            #   failed to match score, because no lc/uc.
|
|  # <--------------   At this point, the question is, did TOP match?
|  #                     Remember, TOP is <score>+, so we match TOP if there 
|  #                     was at least one <score> token that matched, there was so...
|
* MATCH "a b c d e f g\nA B C D E F G\n" # this is the TOP match

The fail here is normal: at some point we will run out of <score> tokens, so a fail is inevitable. When that happens, the grammar engine can move on to whatever comes after the <score>+ in your grammar. Since there's nothing, that fail actually results in a match of the entire string (because TOP matches with implicit /^…$/).

Also, you might consider rewriting your grammar with a rule which inserts <.ws>* automatically (unless it's important for it to be a single space only):

grammar test {
  rule TOP { <score>+ }
  token score {
      [
          | <uc>
          | <lc>
      ]+
  }
  token uc { <[A..G]> }
  token lc { <[a..g]> }
}

Further, IME, you might want to also want to add a proto token for the uc/lc, because when you have [ <foo> | <bar> ] you will always have one of them be undefined which can make processing them in an actions class a bit annoying. You could try:

grammar test {
  rule  TOP   { <score>  + }
  token score { <letter> + }

  proto token letter    {     *    }
        token letter:uc { <[A..G]> }
        token letter:lc { <[a..g]> }
}

$<letter> will always be defined this way.

Recommended topics

Hot tags