What is the best way to (efficiently) parse C-style multi-line comments (i.e., /* ... */
) with Scala parser combinators?
In a project that I'm involved in we parse a C-like programming language, and want to support multi-line comments. We use a subclass of StandardTokenParsers
, which already handles such comments (via StdLexical
. However, the class only works for fairly short multi-line comments, and runs out of stack space otherwise.
We have also tried providing our own definition of whitespace to make things more efficient. We used a RegexParser
(inspired by another question on StackOverflow) as follows:
class Parser extends StandardTokenParsers {
override val lexical = new StdLexical {
def rp: RegexParsers = new RegexParsers {}
override val whitespace: Parser[Any] = rp.regex("""(\s|//.*|(?m)/\*(\*(?!/)|[^*])*\*/)*""".r).asInstanceOf[Parser[Any]]
}
// ...
}
This improved the situation slightly, but still causes a stack overflow if the comment is more than a few dozen lines. Any ideas how to improve this?