What's the meta-object rule for naming grammar rules
Asked Answered
B

3

6

As indicated in this issue, some token names clash with method names in the class hierarchy of Grammar (which includes Match, Capture, Cool, Any and obviously My.). For instance, `Mu.item

grammar g {
    token TOP { <item> };
    token item { 'defined' }
};
say g.parse('defined');

issues an error like this one:

Too many positionals passed; expected 1 argument but got 2␤  
in regex item at xxx

item is part of Anys methods, too; I haven't found any other methods in the other classes whose name fails as a rule, but then there are no subs defined (except for item); most are multis or actually defined as method.

This happens too when submethods like TWEAK of BUILD are used for token names, but the error in this case is different:

Cannot find method 'match': no method cache and no .^find_method␤
at xxx

However, other submethods like FALLBACK have no problem at all:

grammar g { 
  token TOP { <FALLBACK> }; 
  token FALLBACK { 'defined' } 
}; 
say g.parse('defined') # OUTPUT: «「defined」␤ FALLBACK => 「defined」␤»  

and ditto for some other methods in the class hierarchy of Grammar, such as rand or, in general, most methods defined as such.

What problematic names seem to have in common is the fact that they are declared as sub but that is not always the case: CREATE, which caused the whole issue initially, is declared as a method. So it is not clear to me at all what are the names to avoid, and which ones can be used legitimately. Can someone clarify?

Balboa answered 26/5, 2018 at 17:48 Comment(6)
For others reading this in the future, based on JJ's comment on my answer, in his question's title, the word "rule" is used in the English sense and the word "rules" in the P6 sense (i.e. rules/tokens/regexs/methods in a grammar) and the meta-object is, I think, Rakudo's Perl6::Metamodel::GrammarHow and perhaps some analogous nqp meta-object too.Seavir
@Seavir added some explanations: as said above, item is also part of Any, but none of the other functions tested are actually subs, so it might be the case that it only fails with Mu methods or with Mu subs. That's what I would like to know.Balboa
There is no remotely simple answer to this. There are at least three separate bugs each with its own complex impact on rule naming. Moritz and Lizmat confirm an issue that doesn't sound like a repeat of the three I've touched on in my answer so there may well be at least four bugs/issues related to rule naming and more may turn up. Please give up on the idea of fixing this particular nest of bugs by documenting how to avoid them via what will be absurdly complex naming rules. You will waste your time and everyone else's. If you're not convinced, please read my answer carefully.Seavir
@Seavir I never said I was not convinced. I just clarified the intent of my original question, by your request. Your answers clarify the issue, or at least clarify the current not so clear state of the issue, and I'm grateful for that.Balboa
OK. I wasn't sure and I felt I needed to be emphatic that this is crazy complicated and far beyond what any doc work can address. Thanks for your patience with our communication. :)Seavir
/o\ It gets worse. The TWEAK bug causes a compile-time error. I now see that the bug I thought might cover it isn't written of as a compile-time bug. Perhaps Moritz's does. Investigation and answer rewriting continues...Seavir
S
5

This is almost entirely about multiple awkward bugs.

item etc.

See RT#127945 -- Mu methods cannot be used as grammar tokens due to default Actions class. Also token name confilct with internal name ?. Unfortunately this isn't easy to fix.

An explanation of this bug and its impact follows.

Per the Actions mechanism, if a grammar rule matches, the .parse call immediately tries to call a correspondingly named action method.

If you don't explicitly pass an actions class/object to the .parse method then it uses the default, which is Mu. Then, when a rule in your grammar matches, it looks for a Mu method with the same name. If it doesn't find one, all is well. But if it finds one then it calls that method on Mu with the current Match object as the first and only argument. In almost all cases that'll go badly. item is an example of this.

If you do tell the .parse method to use a particular actions class/object, another wrinkle arises:

grammar g           { rule all { all } };
class actions       { }
g.parse: 'all',
         rule    => 'all',
         actions => actions, 

This yields a similar error to item, except this time the all method comes from Any. This is because the actions class's MRO includes Any:

say class actions   { }.^mro ; # ((actions) (Any) (Mu))

You can eliminate this wrinkle by declaring your actions classes with is Mu:

grammar g           { rule all { all } };
class actions is Mu { }
g.parse: 'all',
         rule    => 'all',
         actions => actions, 

This works fine because now the actions only inherit from Mu -- and Mu doesn't have an all method.

It would be great if you could inherit from nothing, but you can't; is Mu is as minimal as you can get.

What can we conclude about this first bug?

Because newer versions of Perl 6 and/or Rakudo may ship with new Mu methods, the safest thing to do to defend against this bug is to always declare an actions class and always declare a method corresponding to every single rule in your grammar. If you do this you don't need to follow any naming rules to avoid this bug.

TWEAK etc.

I will file an RT bug about this if I can't find an existing one.

Golfed:

grammar g { rule TWEAK {} }

This blows up at compile-time (immediately after parsing the closing curly brace of the grammar declaration). So this is definitely not the same bug as the item bug -- because the latter is due to the run-time Actions mechanism that only kicks in after a rule matches.

This does not blow up:

grammar g { method TWEAK {} }

Perhaps, as part of creating/finalizing a grammar package, some code introspects and/or manipulates any TWEAK "method" found in the new grammar package in a way that works fine if it's an ordinary method but blows up if it's not.

However, other submethods like FALLBACK have no problem at all

TWEAK and BUILD methods or submethods in a class are part of standard object construction. They have a very different role to play than FALLBACK (which is called if a method is missing).

What can we conclude about this second bug?

There's clearly something very specific going on with TWEAK and BUILD and they may well be the only two rule names with the problem they exhibit. So just avoid those two names and you'll hopefully be clear of this bug.

Accidentally using built-in rule names

See RT#125518 -- Grammar 'ident' override behaviour.

You can override built-in rules by just specifying your own version.

As dwarring notes "It certainly causes confusion if you accidentally declare [a rule] with the same name as a built-in rule.".

So the key question is, what's the definitive source for knowing built-in rules and how might one manage things given that they may change over time?

(Yes, very vague, I know. Also, I think Perl 6's built-ins must necessarily extend NQP's and that seems likely to be relevant. Also, there are multiple slangs in each overall language and perhaps that's relevant. I plan to discuss this issue more fully in a later edit.)

Other relevant bugs

See also Moritz' answer.

Seavir answered 26/5, 2018 at 21:14 Comment(1)
Can't think of a better way to title the word. It's a rule, because it says how grammar class methods (rules, regexes, tokens) can be named. It's meta-object, because it refers to the construction of the class. I can drop meta-object if you think it does not help, but I really couldn't think of anything better at the times. Also, yes, I mean they are declared as subs, not as methods. And some of them are not part of Mu, but of other classes higher up in the hierarchy.Balboa
O
6

Note also that FALLBACK token in grammars perform a similar function to the FALLBACK method in classes. It is invoked, with the token name when an unknown token is encountered in a grammar.

Changing your example a bit:

grammar g { 
  token TOP { <blah> }; 
  token FALLBACK($name) { {note "$name called" } 'defined' } 
}; 
say g.parse('defined')

Produces

blah called
「defined」
blah => 「defined」
Overdevelop answered 26/5, 2018 at 21:35 Comment(4)
Nice! Have an upvote. :) I couldn't tell from JJ's question title what he was really after, so I just wrote a book...Seavir
This is just a comment that's too big to fit into SO's comment format :-)Overdevelop
My original version of my book length "answer" began by saying it wasn't an answer but wouldn't fit in a comment. Then I figured out enough to delete that bit. I found Grammar 'ident' override behaviour filed by you and am still wondering if that's also part of the mix. Hmm. Probably not.Seavir
That RT relates override of the built-in rule 'ident'. It certainly causes confusion if you accidentally declare token with the same name as a built-in rule.Overdevelop
S
5

This is almost entirely about multiple awkward bugs.

item etc.

See RT#127945 -- Mu methods cannot be used as grammar tokens due to default Actions class. Also token name confilct with internal name ?. Unfortunately this isn't easy to fix.

An explanation of this bug and its impact follows.

Per the Actions mechanism, if a grammar rule matches, the .parse call immediately tries to call a correspondingly named action method.

If you don't explicitly pass an actions class/object to the .parse method then it uses the default, which is Mu. Then, when a rule in your grammar matches, it looks for a Mu method with the same name. If it doesn't find one, all is well. But if it finds one then it calls that method on Mu with the current Match object as the first and only argument. In almost all cases that'll go badly. item is an example of this.

If you do tell the .parse method to use a particular actions class/object, another wrinkle arises:

grammar g           { rule all { all } };
class actions       { }
g.parse: 'all',
         rule    => 'all',
         actions => actions, 

This yields a similar error to item, except this time the all method comes from Any. This is because the actions class's MRO includes Any:

say class actions   { }.^mro ; # ((actions) (Any) (Mu))

You can eliminate this wrinkle by declaring your actions classes with is Mu:

grammar g           { rule all { all } };
class actions is Mu { }
g.parse: 'all',
         rule    => 'all',
         actions => actions, 

This works fine because now the actions only inherit from Mu -- and Mu doesn't have an all method.

It would be great if you could inherit from nothing, but you can't; is Mu is as minimal as you can get.

What can we conclude about this first bug?

Because newer versions of Perl 6 and/or Rakudo may ship with new Mu methods, the safest thing to do to defend against this bug is to always declare an actions class and always declare a method corresponding to every single rule in your grammar. If you do this you don't need to follow any naming rules to avoid this bug.

TWEAK etc.

I will file an RT bug about this if I can't find an existing one.

Golfed:

grammar g { rule TWEAK {} }

This blows up at compile-time (immediately after parsing the closing curly brace of the grammar declaration). So this is definitely not the same bug as the item bug -- because the latter is due to the run-time Actions mechanism that only kicks in after a rule matches.

This does not blow up:

grammar g { method TWEAK {} }

Perhaps, as part of creating/finalizing a grammar package, some code introspects and/or manipulates any TWEAK "method" found in the new grammar package in a way that works fine if it's an ordinary method but blows up if it's not.

However, other submethods like FALLBACK have no problem at all

TWEAK and BUILD methods or submethods in a class are part of standard object construction. They have a very different role to play than FALLBACK (which is called if a method is missing).

What can we conclude about this second bug?

There's clearly something very specific going on with TWEAK and BUILD and they may well be the only two rule names with the problem they exhibit. So just avoid those two names and you'll hopefully be clear of this bug.

Accidentally using built-in rule names

See RT#125518 -- Grammar 'ident' override behaviour.

You can override built-in rules by just specifying your own version.

As dwarring notes "It certainly causes confusion if you accidentally declare [a rule] with the same name as a built-in rule.".

So the key question is, what's the definitive source for knowing built-in rules and how might one manage things given that they may change over time?

(Yes, very vague, I know. Also, I think Perl 6's built-ins must necessarily extend NQP's and that seems likely to be relevant. Also, there are multiple slangs in each overall language and perhaps that's relevant. I plan to discuss this issue more fully in a later edit.)

Other relevant bugs

See also Moritz' answer.

Seavir answered 26/5, 2018 at 21:14 Comment(1)
Can't think of a better way to title the word. It's a rule, because it says how grammar class methods (rules, regexes, tokens) can be named. It's meta-object, because it refers to the construction of the class. I can drop meta-object if you think it does not help, but I really couldn't think of anything better at the times. Also, yes, I mean they are declared as subs, not as methods. And some of them are not part of Mu, but of other classes higher up in the hierarchy.Balboa
T
4

The rule seems to be "if the grammar engine itself calls a method, you cannot redefine it as a regex/token".

Sadly, there is no documentation about this, and most likely it is very implementation dependent.

Traylor answered 27/5, 2018 at 8:7 Comment(1)
Yeah, I've run into that myself with modules.perl6.org/dist/P5__DATA__:cpan:ELIZABETHInharmonious

© 2022 - 2024 — McMap. All rights reserved.