PEG and whitespace/comments

Asked 9/4, 2012 at 11:21 Answered 5/5, 2012 at 17:8

I have some experience writing parsers with ANTLR and I am trying (for self-education :) ) to port one of them to PEG (Parsing Expression Grammar).

As I am trying to get a feel for the idea, one thing strikes me as cumbersome, to the degree that I feel I have missed someting: How to deal with whitespace.

In ANTLR, the normal way to deal with whitespace and comments were to put the tokens in a hidden channel, but with PEG grammars there is no tokenization step. Considering languages such as C or Java, where comments are allowed almost everywhere, one would like to "hide" the comments right away, but since the comments may have semantic meaning (for example when generating code documentation, class diagrams, etc), one would not just like to discard them.

So, is there a way to deal with this?

Emilyemina answered 9/4, 2012 at 11:21 Comment(0)

Because there is no separate tokenization phase, there is no "time" to discard certain characters (or tokens).

Since you're familiar with ANTLR, think of it like this: let's say ANTLR handles only PEG. So you only have parser rules, no lexer rules. Now how would you discard, say, spaces? (you can't).

So, the answer to you question is: you can't, you'll have to litter your grammar with space-rules in the PEG:

ANTLR

add_expr
 : Num Add Num
 ;

Add   : '+';
Num   : '0'..'9'+;
Space : ' '+ {skip();};

PEG

add_expr
 : num _ '+' _ num
 ;

num : '0'..'9'+;
_   : ' '*;

Stasiastasis answered 13/4, 2012 at 11:13 Comment(2)

That is what I suspected. Just wanted to confirm I did not fundamentally misunderstand something. Thanks! – Emilyemina 13/4, 2012 at 12:29

Good answer, I almost felt like going for another parser generator until I saw your answer using "_", which makes PEG a lot more readable! – Hideandseek 17/10, 2013 at 18:50

It is possible to nest PEG parsers. The idea is that the first parsers consumes characters and feeds tokens to the second parser. The second PEG parser consumes tokens and does the real work.

Of course this means that you give up one advantage of Parsing Expression Grammar compared to other parsing schemes: The simplicity of PEG.

Shiverick answered 5/5, 2012 at 17:8 Comment(1)

Why would you do this? If you cannot define the terminals/non terminals in the top level parser as second parser will not help... – Sabotage 21/5, 2012 at 5:33

ANTLR

PEG

Recommended topics

Hot tags