Grammar.parse seems to loop forever and use 100% CPU
Asked Answered
A

1

6

Reposted from the #perl6 IRC channel, by jkramer, with permission

I'm playing with grammars and trying to parse an ini-style file but somehow Grammar.parse seems to loop forever and use 100% CPU. Any ideas what's wrong here?

grammar Format {
  token TOP {
    [
      <comment>*
      [
        <section>
        [ <line> | <comment> ]*
      ]*
    ]*
  }

  rule section {
    '[' <identifier> <subsection>? ']'
  }

  rule subsection {
    '"' <identifier> '"'
  }

  rule identifier {
    <[A..Za..z]> <[A..Za..z0..9_-]>+
  }

  rule comment {
    <[";]> .*? $$
  }

  rule line {
    <key> '=' <value>
  }

  rule key {
    <identifier>
  }

  rule value {
    .*? $$
  }
}

Format.parse('lol.conf'.IO.slurp)
Artema answered 12/4, 2018 at 15:44 Comment(1)
Maybe you could also post the sample ini-file lol.conf?Chibouk
B
7

Token TOP has the * quantifier on a subregex that can parse an empty string (because both <comment> and the group that contains <section> have a * quantifier on their own).

If the inner subregex matches the empty string, it can do so infinitely many times without advancing the cursor. Currently, Perl 6 has no protection against this kind of error.

It looks to me like you could simplify your code to

token TOP {
  <comment>*
  [
    <section>
    [ <line> | <comment> ]*
  ]*
}

(there is no need for the outer group of [...]*, because the last <comment> also matches comments before sections.

Buchholz answered 12/4, 2018 at 15:59 Comment(2)
Shouldn't you also use token instead of rule here? For example, the spaces in rule comment { <[";]> .*? $$ } could gobble up newline characters before we reach the $$ or am I wrong?Chibouk
If vertical whitespace is significant, as suggested by use of $$, then it would be sensible to override token ws { <!ww> \h* } to match only horizontal whitespace. Much more on that, and two working grammars for INI files can be found in smile.amazon.com/Parsing-Perl-Regexes-Grammars-Recursive-ebook/…Buchholz

© 2022 - 2024 — McMap. All rights reserved.