proto token candidates ordering

#!/usr/bin/env perl6 no precompilation; use Grammar::Tracer; grammar G { token TOP { <s> } proto token s { * } token s:sym<a> { <one> } token s:sym<b> { <one> <two> } token s:sym<c> { <one> <two> <three> } token s:sym<d> { <one> <two> <three> <four> } token one { '1' } token two { '2' } token three { '3' } token four { '4' } } my $g = G.new; say $g.parse: '1234';

How does perl6 decide which proto token to match against first?

It uses "Longest alternation" logic. In your (well presented!) case the relevant deciding factors are as follows.

First, select the branch which has the longest declarative prefix.

So the first thing to focus on is that it's not "longest token" but longest declarative prefix, the start of a pattern that contains nothing but contiguous "declarative" "atoms".

A 3 is a declarative atom.

A <foo> may or may not be; it depends on what it includes.

I haven't found clear official documentation identifying which built in patterns are declarative and which are not but it looks like all the ones declared with a slash, eg \d, are declarative whereas all the ones declared in the form form <foo>, eg <digit>, are not. (Note in particular that the built in <ws> pattern is not declarative. Given that spaces after atoms in rules are converted to <ws>, this means the first such space terminates the declarative prefix of that rule.)

So a <digit> atom isn't part of a declarative prefix and instead terminates a prefix.

Moving s:sym<d> to the top, matches the string in both cases, but what is the explanation for that behavior?

Because with the change of <three> to call <digit> you've changed your rules to have three that tie for longest declarative prefix (<one> <two>). So other tie-breaking rules are used.

If all else fails in those tie-breaking rules to pick a winner, then the last "left-most" rule is selected, which, ignoring inheritance, means the rule that comes first lexically.

Recommended topics

Hot tags