The ~
method on parser combines two parser in one which applies the two original parsers successively and returns the two results. That could be simply (in Parser[T]
)
def ~[U](q: =>Parser[U]): Parser[(T,U)].
If you never combined more than two parsers, that would be ok. However, if you chain three of them, p1
, p2
, p3
, with return types T1
, T2
, T3
, then p1 ~ p2 ~ p3
, which means p1.~(p2).~(p3)
is of type Parser[((T1, T2), T3)]
. And if you combine five of them as in your example, that would be Parser[((((T1, T2), T3), T4), T5)]
. Then when you pattern match on the result, you would have all those parantheses too :
case ((((_, id), _), formals), _) => ...
This is quite uncomfortable.
Then comes a clever syntactic trick. When a case class has two parameters, it can appears in infix rather than prefix position in a pattern. That is, if you have
case class X(a: A, b: B)
, you can pattern match with case X(a, b)
, but also with case a X b
. (That is what is done with a pattern x::xs
to match a non empty List, ::
is a case class).
When you write case a ~ b ~ c
, it means case ~(~(a,b), c)
, but is much more pleasant, and more pleasant than case ((a,b), c)
too, which is tricky to get right.
So the ~
method in Parser returns a Parser[~[T,U]]
instead of a Parser[(T,U)]
, so you can pattern match easily on the result of multiple ~. Beside that, ~[T,U]
and (T,U)
are pretty much the same thing, as isomorphic as you can get.
The same name is chosen for the combining method in parser and for the result type, because the resulting code is natural to read. One sees immediately how each part in the result processing relates to the items of the grammar rule.
parser1 ~ parser2 ~ parser3 ^^ {case part1 ~ part2 ~ part3 => ...}
Tilda is chosen because its precedence (it binds tightly) plays nicely with the other operators on parser.
One last point, there are auxiliary operators ~>
and <~
which discard the result of one of the operand, typically the constant parts in the rule which carries no useful data. So one would rather write
"class" ~> ID <~ ")" ~ formals <~ ")"
and get only the values of ID and formals in the result.