How can parsers be used to parse records that spans multiple lines? I need to parse tree data (and eventually transform it to a tree data structure). I'm getting a difficult-to-trace parse error in the code below, but its not clear if this is even the best approach with Scala parsers. The question is really more about the problem solving approach rather than debugging existing code.
The EBNF-ish grammer is:
SP = " "
CRLF = "\r\n"
level = "0" | "1" | "2" | "3"
varName = {alphanum}
varValue = {alphnum}
recordBegin = "0", varName
recordItem = level, varName, [varValue]
record = recordBegin, {recordItem}
file = {record}
An attempt to implement and test the grammer:
import util.parsing.combinator._
val input = """0 fruit
1 id 2
1 name apple
2 type red
3 size large
3 origin Texas, US
2 date 2 aug 2011
0 fruit
1 id 3
1 name apple
2 type green
3 size small
3 origin Florida, US
2 date 3 Aug 2011"""
object TreeParser extends JavaTokenParsers {
override val skipWhitespace = false
def CRLF = "\r\n" | "\n"
def BOF = "\\A".r
def EOF = "\\Z".r
def TXT = "[^\r\n]*".r
def TXTNOSP = "[^ \r\n]*".r
def SP = "\\s".r
def level: Parser[Int] = "[0-3]{1}".r ^^ {v => v.toInt}
def varName: Parser[String] = SP ~> TXTNOSP
def varValue: Parser[String] = SP ~> TXT
def recordBegin: Parser[Any] = "0" ~ SP ~ varName ~ CRLF
def recordItem: Parser[(Int,String,String)] = level ~ varValue ~ opt(varValue) <~ CRLF ^^
{case l ~ f ~ v => (l,f,v.map(_+"").getOrElse(""))}
def record: Parser[List[(Int,String,String)]] = recordBegin ~> rep(recordItem)
def file: Parser[List[List[(Int,String,String)]]] = rep(record) <~ EOF
def parse(input: String) = parseAll(file, input)
}
val result = TreeParser.parse(input).get
result.foreach(println)