Default definition of white space method ws in a grammar
Asked Answered
E

1

6

According to the documentation the default definition of the ws method in a grammar is to match zero or more whitespace characters, as long as that point is not within a word:

regex ws { <!ww> \s* }

What is the difference between this definition and the following:

regex ws { \s+ }

I wonder why the zero width assertion <!ww> is used instead of the simpler \s+? I also note that the default definition allows to match zero white spaces, but when would that actually happen? Wouldn't it be more clear if it used \s+ instead of \s*?

Elder answered 1/4, 2019 at 19:1 Comment(2)
Ok, so then the default ws will match between consecutive characters that are not word characters and not white space (like emojis). For example: perl6 -e 'my $str="\c[carrot]\c[potato]"; say $str.split(/<!ww>\s*/).elems' gives 4. Is this a reasonable behavior?Algebraic
Larry considered it reasonable default behavior. The idea is that it works well for most typical grammars and if you want something else you just define your own ws rule.Foley
O
9

The ww assertion means that there are chars matching \w either side of the current point. The ! inverts it, meaning <!ww> matches:

  • At the start of the string
  • At the end of the string
  • When there's a non-\w character before the current position (such as between "+" and "a")
  • When there's a non-\w character after the current position (such as between "a" and "+")

Effectively, then, it means that whitespace can never be considered to occur between two word characters. However, between non-word characters, or between a word character and a non-word character, then there can be considered whitespace.

This follows what many languages we might wish to parse need. For example, consider ab+cd. The default ws will match either side of the +, but would not, for example, match within an identifier.

For languages where that isn't suitable, it's simply a matter of overriding the default ws for whatever that language needs.

Ossa answered 1/4, 2019 at 20:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.