Scala Parser Combinators: Efficiently Parse C-Style Comments

What is the best way to (efficiently) parse C-style multi-line comments (i.e., /* ... */) with Scala parser combinators?

In a project that I'm involved in we parse a C-like programming language, and want to support multi-line comments. We use a subclass of StandardTokenParsers, which already handles such comments (via StdLexical. However, the class only works for fairly short multi-line comments, and runs out of stack space otherwise.

We have also tried providing our own definition of whitespace to make things more efficient. We used a RegexParser (inspired by another question on StackOverflow) as follows:

class Parser extends StandardTokenParsers {

  override val lexical = new StdLexical {
    def rp: RegexParsers = new RegexParsers {}
    override val whitespace: Parser[Any] = rp.regex("""(\s|//.*|(?m)/\*(\*(?!/)|[^*])*\*/)*""".r).asInstanceOf[Parser[Any]]
  }

  // ...

}

This improved the situation slightly, but still causes a stack overflow if the comment is more than a few dozen lines. Any ideas how to improve this?

Recommended topics

Hot tags