Difference in capturing and non-capturing regex scope in Perl 6 / Raku
Asked Answered
E

1

6

Although the docs state that calling a token/rule/regex as <.foo> instead of <foo> makes them non-capturing, it seems there is a difference in scope, but I'm not sure if it's intended.

Here is a simplified test. In a module file:

unit module Foo;
my token y           {     y  }
my token a is export { x  <y> }
my token b is export { x <.y> }

Inside of another script file:

grammar A {
  use Foo;
  token TOP { <a> }
}

grammar B {
  use Foo;
  token TOP { <b> }
}

If we calling A.parse("xy") everything runs as expected. However, calling B.parse("xy") results in the error No such method 'y' for invocant of type 'B'. Is this expected behavior or a potential bug?

Eulogia answered 29/9, 2019 at 18:25 Comment(0)
D
6

The intention per S05

The intention according to the relevant speculation/design doc includes:

<foo ...>

This form always gives preference to a lexically scoped regex declaration, dispatching directly to it as if it were a function. If there is no such lexical regex (or lexical method) in scope, the call is dispatched to the current grammar, assuming there is one.

...

A leading . explicitly calls a method as a subrule; the fact that the initial character is not alphanumeric also causes the named assertion to not capture what it matches.

...

A call to <foo> will fail if there is neither any lexically scoped routine of that name it can call, nor any method of that name that be reached via method dispatch. (The decision of which dispatcher to use is made at compile time, not at run time; the method call is not a fallback mechanism.)

Examples of forms

  • <bar> is as explained above. It preferentially resolves to an early bound lexical (my/our) routine/rule named &bar. Otherwise it resolves to a late bound attempt to call a has (has) method/rule named bar. If it succeeds it stores the match under a capture named bar.

  • <.bar> calls a has (has) method/rule named bar if it finds one. It does not capture.

  • <bar=.bar> calls a has (has) method/rule named bar if it finds one. If it succeeds it stores the match under a capture named bar. In other words, it's the same as <bar> except it only attempts to call a has method named .bar; it doesn't first attempt to resolve to a lexical &bar.

  • <&bar> and <.&bar> mean the same thing. They call a lexical routine named &bar and do not capture. To do the same thing, but capture, use <bar=&bar> or <bar=.&bar>.

(If you read the speculation/design doc linked above and try things, you'll find most of the design details that doc mentions have already been implemented in Rakudo even if they're not officially supported/roasted/documented.)

Scope examples

First the common case:

grammar c {
  has rule TOP { <bar> }
  has rule bar { . { say 'has rule' } }
}
say c.parse: 'a';

displays:

has rule
「a」
 bar => 「a」

(The has declarators are optional and it's idiomatic to exclude them.)

Now introducing a rule lexically scoped to the grammar block:

grammar c {
  my  rule bar { . { say 'inner my rule' } }
  has rule TOP { <bar> }
  has rule bar { . { say 'has rule' } }
}
say c.parse: 'a';

displays:

inner my rule
「a」
 bar => 「a」

Even a lexical rule declared outside the grammar block has precedence over has rules:

my rule bar { . { say 'outer my rule' } }
grammar c {
  has rule TOP { <bar> }
  has rule bar { . { say 'has rule' } }
}
say c.parse: 'a';

displays:

outer my rule
「a」
 bar => 「a」
Damalis answered 29/9, 2019 at 20:24 Comment(5)
Fantastic answer. (it almost could be slightly reworked and put on perldoc, because the capturing (and non-regex methods that return matches) aren't well documented and this would fill in a huge gap.Eulogia
(also I find the preference for lexical scope over the grammar's method a bit... unexpected, I would think it ought to go the other way around.Eulogia
The phrasing "the grammar's method" is problematic. In grammar g { my method foo {} }, foo could reasonably be referred to as "the grammar's method". I know what you mean tho. I too wasn't expecting it. That said, I'm still at the first could stage in my could/would/should journey from surprise to new surmise. Clearly, it could go either way and Larry has ensured that one can go either way. But what if it were the other way? One thing I noticed is that, as it stands, one can shift a rule call from late to early binding just by adding a my to an existing rule declaration.Damalis
raiph: True, although I meant it in the sense that my rule bar is created outside of grammar's inner-scope, but takes precedence over things in that inner-scope. The good thing is that what I'm doing probably isn't the most common right now and will likely to be isolated to module devs: I'm basically create a module that imports in a token, for unambiguous encapsulation (role mixin was jnhtn's original suggested approach, but I don't like the potential ambiguity there of method names [even though I know it can be cleared up], and it doesn't allow the token to be used outside grammars).Eulogia
github.com/alabamenhu/Intl-CLDR/blob/master/lib/Intl/CLDR/… Here's the result, thanks to your helpEulogia

© 2022 - 2024 — McMap. All rights reserved.