Writing a pretty-printer for legacy code in an older language. The plan is for me to learn parsing and unparsing before I write a translator to output C++. I kind of got thrown into the deep end with Java and ANTLR back in June, so I definitely have some knowledge gaps.
I've gotten to the point where I'm comfortable writing methods for my custom listener, and I want to be able to pretty-print the comments as well. My comments are on a separate hidden channel. Here are the grammar rules for the hidden tokens:
/* Comments and whitespace -- Nested comments are allowed, each is redirected to a specific channel */
COMMENT_1 : '(*' (COMMENT_1|COMMENT_2|.)*? '*)' -> channel(1) ;
COMMENT_2 : '{' (COMMENT_1|COMMENT_2|.)*? '}' -> channel(1) ;
NEWLINES : [\r\n]+ -> channel(2) ;
WHITESPACE : [ \t]+ -> skip ;
I've been playing with the Cymbol CommentShifter example on p. 207 of The Definitive ANTLR 4 Reference and I'm trying to figure out how to adapt it to my listener methods.
public void exitVarDecl(ParserRuleContext ctx) {
Token semi = ctx.getStop();
int i = semi.getTokenIndex();
List<Token> cmtChannel = tokens.getHiddenTokensToRight(i, CymbolLexer.COMMENTS);
if (cmtChannel != null) {
Token cmt = cmtChannel.get(0);
if (cmt != null) {
String txt = cmt.getText().substring(2);
String newCmt = "// " + txt.trim(); // printing comments in original format
rewriter.insertAfter(ctx.stop, newCmt); // at end of line
rewriter.replace(cmt, "\n");
}
}
}
I adapted this example by using exitEveryRule
rather than exitVarDecl
and it worked for the Cymbol example but when I adapt it to my own listener I get a null pointer exception whether I use exitEveryRule
or exitSpecificThing
I'm looking at this answer and it seems promising but I think what I really need is an explanation of how the hidden channel data is stored and how to access it. It took me months to really get listener methods and context in the parse tree.
It seems like CommonTokenStream.LT()
, CommonTokenStream.LA()
, and consume()
are what I want to be using, but why is the example in that SO answer using completely different methods from the ANTLR book example? What should I know about the token index or token types?
I'd like to better understand the logic behind this.