What's the best way to be lax on whitespace in a perl6 grammar? - McMap

About

What's the best way to be lax on whitespace in a perl6 grammar?

Asked 8/6, 2019 at 13:58 Answered 8/6, 2019 at 14:49

G

1

11

I want to have a grammar that is lax in whether whitespace is present or not... I want to match:

this                '   <foo>    <bar>    <baz>    '
and also this       '<foo><bar><baz>'

This works:

token TOP     { \s* <foo> \s* <bar> \s* <baz> \s* }

But after reading all about :sigspace, <.ws> and rule I can imagine that there is a way to do this without the repeated *\s . (viz. How do I match a hex array in per6 grammar)

Please can someone tell me if there is nicer way to do this in a perl6 grammar?

NB. this is not solved by simply changing the token declarator to rule - when I try that approach I end up either matching space or no space (but not both) in the parse string.

Graminivorous answered 8/6, 2019 at 13:58 Comment(0)

B

11

Perhaps your problem is one these three rule "gotchyas":

If you want white space / token boundary matching at the start of a rule, before the first atom, you must explicitly provide it (typically with an explicit <.ws>).
If you want white space / token boundary matching between each of the matches of a quantified atom (eg <foo>*) you must include space between the atom and the quantifier (eg <foo> *).
The default <ws> is defined as regex ws { <!ww> \s* }. If you want rules in a particular grammar to use a different pattern, then define your own in that grammar. (timotimo++)

For further discussion of the above, see my updated answer to How do I match a hex array in per6 grammar.

The following four regexes match both your sample strings:

my \test-strings := '   <foo>    <bar>    <baz>    ', '<foo><bar><baz>';

my \test-regexes := token { \s*   '<foo>' \s* '<bar>' \s* '<baz>' \s* },
                    rule  { \s*   '<foo>' \s* '<bar>' \s* '<baz>' \s* },
                    rule  { \s*   '<foo>'     '<bar>'     '<baz>'     },
                    rule  { <.ws> '<foo>'     '<bar>'     '<baz>'     }

say (test-strings X~~ test-regexes).all ~~ Match # True

Bestiary answered 8/6, 2019 at 14:49 Comment(4)

it could also be the case that the default <ws> rule requires a word boundary if there is no whitespace, and if it's not literally "<foo>" that is to be matched, but "something", that could be a problem. So defining your own ws to just be \s* would do the trick in that case – Allstar 8/6, 2019 at 23:18

thank you @Bestiary and timotimo for your prompt answers - after some time to try the variations you propose when in a suitable mind state, I conclude I was unable to clearly articulate my problem [I have marked as 'answered' because I think you did a very good job in the situation] for now, since it works for me I will stick to \s* – Graminivorous 15/7, 2019 at 21:6

PS. I love the X~~ "cross product" & junction – Graminivorous 15/7, 2019 at 21:8

o/ @p6steve. Thanks for the follow up. I would love to understand what you were driving at. Please feel free to ask another SO or three if you wish and I will gladly have more goes at it. (That said, please consider reading the Håkon's solution to another likely problem section in https://mcmap.net/q/1018546/-how-do-i-access-the-optional-parts-of-a-grammar-in-perl6 first. I highly doubt it's at all relevant but it's the most explicit answer I've yet written about use of :sig-space and worthy of getting out of the way first if you do decide to dig into this again.) PS. I loved the X~~ too. I made that up for this answer. :) – Bestiary 15/7, 2019 at 21:23

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.