How to preserve whitespace when we use text attribute in Antlr4

Asked 23/10, 2014 at 8:41 Answered 15/1, 2021 at 12:57

I want to keep white space when I call text attribute of token, is there any way to do it? Here is the situation: We have the following code

IF L > 40 THEN;

ELSE

  IF A = 20 THEN
      PUT "HELLO";

In this case, I want to transform it into:

if (!(L>40){

      if (A=20)
          put "hello";
}

The rule in Antlr is that:

stmt_if_block: IF expr
               THEN x=stmt
               (ELSE y=stmt)?
               {
                 if ($x.text.equalsIgnoreCase(";"))
                 {
                   WriteLn("if(!(" + $expr.text +")){");
                   WriteLn($stmt.text);
                   Writeln("}");
                 }
               }

But the result looks like:

if(!(L>40))
{
   ifA=20put"hello";
}

The reason is that the white space in $stmt was removed. I was wondering if there is anyway to keep these white space
Thank you so much

Update: If I add

SPACE: [ ] -> channel(HIDDEN);

The space will be preserved, and the result would look like below, many spaces between tokens:

 IF SUBSTR(WNAME3,M-1,1) = ')'             THEN                                        M = L;                                  ELSE                                        M = L - 1;

Rumen answered 23/10, 2014 at 8:41 Comment(2)

You are trying to do on-the-fly (completely local) code translation between languages (this is your PL/1 to Java example again) using pure string hacking. You may generate working results, but it will be terrible code, because it doesn't take into account context from other parts of the program. (In your particular case, it entirely fails to translate the PL/1 PUT clause; this is typical of the kind of trouble such on-the-fly-generators have). – Whit 1/5, 2015 at 5:49

Ignoring the big issues, one way to handle the spacing problem is to not try to preserve the spacing of the original program in the translated one; there's no good reason to believe these are compatible, let alone will produce a readable results. You might be better to translate (PL/1) ASTs to (Java) ASTs, and then prettyprint the result with spacing appropriate for the target. See #5832912 – Whit 1/5, 2015 at 5:49

This is the C# extension method I use for exactly this purpose:

public static string GetFullText(this ParserRuleContext context)
{
    if (context.Start == null || context.Stop == null || context.Start.StartIndex < 0 || context.Stop.StopIndex < 0)
        return context.GetText(); // Fallback

    return context.Start.InputStream.GetText(Interval.Of(context.Start.StartIndex, context.Stop.StopIndex));
}

Since you're using java, you'll have to translate it, but it should be straightforward - the API is the same.

Explanation: Get the first token, get the last token, and get the text from the input stream between the first char of the first token and the last char of the last token.

Knife answered 23/10, 2014 at 16:52 Comment(2)

Thank you Lucas, but it doesn't help. So right now I use SPACE: [ ] -> channel(HIDDEN); then execute trimming the text. – Rumen 24/10, 2014 at 5:57

Any extension method must be declared in a static, non-generic class such as public static class ANTLRHelper – Hynes 17/8, 2023 at 8:2

@Lucas solution, but in java in case you have troubles in translating:

private String getFullText(ParserRuleContext context) {
    if (context.start == null || context.stop == null || context.start.getStartIndex() < 0 || context.stop.getStopIndex() < 0)
        return context.getText();

    return context.start.getInputStream().getText(Interval.of(context.start.getStartIndex(), context.stop.getStopIndex()));
}

Lillis answered 5/11, 2019 at 21:10 Comment(2)

I know this is old but I logged in to upvote this. Thank you so much. – Berglund 3/7, 2020 at 0:34

it works for me, thanks, that is very hard to findout. – Segregate 1/8, 2023 at 13:25

Looks like InputStream is not always updated after removeLastChild/addChild operations. This solution helped me for one grammar, but it doesn't work for another.

Works for this grammar.

Doesn't work for modern groovy grammar (for some reason inputStream.getText contains old text).

I am trying to implement function name replacement like this:

enterPostfixExpression(ctx: PostfixExpressionContext) {
   // Get identifierContext from ctx
   ...
   const token = CommonTokenFactory.DEFAULT.createSimple(GroovyParser.Identifier, 'someNewFnName');
   const node = new TerminalNode(token);
   identifierContext.removeLastChild();
   identifierContext.addChild(node);

UPD: I used visitor pattern for the first implementation

Bricky answered 15/1, 2021 at 12:57 Comment(0)

Recommended topics

Hot tags