ANTLR4: TokenStreamRewriter output doesn't have proper format (removes whitespaces)
Asked Answered
A

1

11

I am using Antlr4 and java7 grammar (source) for modifying an input Java Source file. More specifically, I am using the TokenStreamRewriter class to modify some tokens. The following code is a sample that shows how the tokens are modified:

public class TestListener extends JavaBaseListener {    
   private TokenStreamRewriter rewriter;
   rewriter = new TokenStreamRewriter(tokenStream);
   rewriter.replace(ctx.getStart(), ctx.getStop(), "someText");
}

When I print the altered source code, the white spaces and tabs are removed and the new source file's format is like this:

importjava.util.ArrayList;publicclassMain{publicstaticvoidmain(String[]args{MyTimertimer=newMyTimer();}}

I am using extractor.getText() for printing it back.

Is this a problem of the grammar used or should I use some other method from the TokenStreamRewriter class?

Animate answered 19/2, 2014 at 18:21 Comment(0)
G
27

The issue is that the lexer is not sending white space to the parser, which means that the rewrite stream doesn't have access to the tokens either. It is because of the skip lexer command:

WS : [ \t\r\n\u000C]+ -> skip ;

You have to change all those to -> channel(HIDDEN) which will send them to the parser on a different channel, making them available in the token stream, but invisible to the parser.

Gaia answered 19/2, 2014 at 19:43 Comment(6)
Thank you very much for your quick reply. The proposed change in the file (Java.g4) worked well.Animate
Within a context, the interval boundaries are stored, and there is way to access the entire inputStream, which retrieves all text, regardless of skip or HIDDEN channel. TokenStreamRewriter is fundamentally broken as it gives neither access to original stream start/stop indexes, nor overloads of node GetText so we can obtain the actual text. GetText() on TokenStreamRewriter serves no purpose. That is why you have to massively hack your grammer.Acidulant
It may be possible to call TokenStreamRewriter.GetText() token-by-token, keeping track of all the context intervals and adding back the whitespace retrieved from walking the context...Acidulant
@Acidulant Calling TokenStream ts = rewriter.getTokenStream(); and then for (int i=0; i<ts.size(); i++) { Token token = ts.get(i); } doesn't help as this just gives the original (non rewritten) token streamGotha
Mysteriously in the book The Definitive ANTLR 4 Reference, the example given on p.52 indicates that TokenStreamRewriter.getText() inserts appropriate whitespace, but it does not. Maybe something changed in ANTLR since the book came out (2013? time flies). The book needs a version 2 (and should switch to using Junit5 tests instead of the command line, it's so much more flexible but that's just by the way)Gotha
Mysteriously in the book The Definitive ANTLR 4 Reference, the example given on p.52 indicates that TokenStreamRewriter.getText() inserts appropriate whitespace, but it does not. The book needs a version 2 (and should switch to using Junit5 tests instead of the command line, it's so much more flexible but that's just by the way). Update: Ok, it talks about channels in the subsequent chapter. That's confusing. 🤔Gotha

© 2022 - 2024 — McMap. All rights reserved.