What's an "additional tie breaker" for Perl 6 longest token matching?
Asked Answered
M

1

8

The docs for Perl 6 longest alternation in regexes punt to Synopsis 5 to document the rules for longest token matching. There are three rules if different alternatives would match a substring of the same length:

  • The longest declarative prefix breaks the tie
  • The highest specificity breaks the tie
  • "If it's still a tie, use additional tie-breakers."
  • The left most alternation finally wins

It's that third rule that I'm curious about.

Mllly answered 20/4, 2018 at 13:44 Comment(8)
As I read it there are only 3 rules, and "the leftmost alternation wins" is one (the only?) additional tiebreakerPicador
It's a design document. It probably means "let's leave that to the implementation"Expectant
I think @Picador is right; that interpretation is supported by the fact that there are only three bullet points. The text under each bullet point could have been indented for clarity.Lusatian
I'm not sure what it means by additional tie-breakers, and I'm not sure if there are any. I would suspect not. It could have been put in there for “wiggle room”.Haeres
This is perhaps obvious, but "leftmost alternation wins" must be understood to be leftmost in MRO and only then leftmost textually/lexically within each grammar in an inheritance chain. In other words, if there's a file called tie-together.pm6 containing grammar tie { token TOP { <foo> }; proto token foo {*}; token foo:alt<A> { a . { say 'A' } } }; grammar tie-too is tie { token foo:alt<B> { a . { say 'B' } } and a script use tie-together; tie-too.parse: 'aa';, then stdout displays B because tie-too comes beforetie (so tie-too is leftmost) in the MRO resolving the .parse call.Asseverate
I think the answer is "no one knows but we will all guess".Mllly
This may again be obvious, but another tie-breaker, also mentioned in the design doc, is that the user may manually force a failure if the wrong one is tried first. This naturally falls out of the way matching works.Asseverate
According to the LTM design doc that's linked from the LTM user doc, manual failing is the only mechanism available to reliably deterministically tie break if A) a tied set of LTM alternatives for some match are from a grammar defined in one compilation unit that is augmented in one or more other compilation units and B) neither prefix length nor specificity resolve a tie.Asseverate
M
0

First the way the text is organized makes clear that the behaviour of the implementation must be deterministic (not random).

Second - and more important - describing the exact behaviour of existing implementations could fill an entire, hard-to-understand page as every corner case has to be described. In addition such a specification would limit degrees of freedom of the implementation. Let's assume some implementation supports a "fastest implementation" flag. Such an implementation can use unspecified parts to make short-cuts. So leaving the behaviour unspecified resp. restricted to the minimum has some advantages.

Marable answered 17/11, 2018 at 14:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.