What's an "additional tie breaker" for Perl 6 longest token matching?

About

Asked 20/4, 2018 at 13:44 Answered 17/11, 2018 at 14:42

The docs for Perl 6 longest alternation in regexes punt to Synopsis 5 to document the rules for longest token matching. There are three rules if different alternatives would match a substring of the same length:

The longest declarative prefix breaks the tie
The highest specificity breaks the tie
"If it's still a tie, use additional tie-breakers."
The left most alternation finally wins

It's that third rule that I'm curious about.

Mllly answered 20/4, 2018 at 13:44 Comment(8)

As I read it there are only 3 rules, and "the leftmost alternation wins" is one (the only?) additional tiebreaker – Picador 20/4, 2018 at 13:51

It's a design document. It probably means "let's leave that to the implementation" – Expectant 20/4, 2018 at 17:27

I think @Picador is right; that interpretation is supported by the fact that there are only three bullet points. The text under each bullet point could have been indented for clarity. – Lusatian 20/4, 2018 at 21:45

I'm not sure what it means by additional tie-breakers, and I'm not sure if there are any. I would suspect not. It could have been put in there for “wiggle room”. – Haeres 21/4, 2018 at 13:58

This is perhaps obvious, but "leftmost alternation wins" must be understood to be leftmost in MRO and only then leftmost textually/lexically within each grammar in an inheritance chain. In other words, if there's a file called tie-together.pm6 containing

grammar tie { token TOP { <foo> }; proto token foo {*}; token foo:alt<A> { a . { say 'A' } } }; grammar tie-too is tie { token foo:alt<B> { a . { say 'B' } }

and a script use tie-together; tie-too.parse: 'aa';, then stdout displays B because tie-too comes beforetie (so tie-too is leftmost) in the MRO resolving the .parse call. – Asseverate 22/4, 2018 at 12:3

I think the answer is "no one knows but we will all guess". – Mllly 23/4, 2018 at 12:28

This may again be obvious, but another tie-breaker, also mentioned in the design doc, is that the user may manually force a failure if the wrong one is tried first. This naturally falls out of the way matching works. – Asseverate 24/4, 2018 at 8:11

According to the LTM design doc that's linked from the LTM user doc, manual failing is the only mechanism available to reliably deterministically tie break if A) a tied set of LTM alternatives for some match are from a grammar defined in one compilation unit that is augmented in one or more other compilation units and B) neither prefix length nor specificity resolve a tie. – Asseverate 24/4, 2018 at 8:15

First the way the text is organized makes clear that the behaviour of the implementation must be deterministic (not random).

Second - and more important - describing the exact behaviour of existing implementations could fill an entire, hard-to-understand page as every corner case has to be described. In addition such a specification would limit degrees of freedom of the implementation. Let's assume some implementation supports a "fastest implementation" flag. Such an implementation can use unspecified parts to make short-cuts. So leaving the behaviour unspecified resp. restricted to the minimum has some advantages.

Marable answered 17/11, 2018 at 14:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags