TypeScript AST transformation removes all blank lines
Asked Answered
A

2

9

I've written a VS Code extension that uses TypeScrpt AST API for organizing class members. My issue is that after running ts.transform(...) and than convert transformed syntax tree back to text all empty lines are missing making the resulting source code incorrectly formatted. How do I prevent AST API from removing blank lines?

Sample of the code I'm using:

let sourceFile: ts.SourceFile;
let sourceCode: string;

sourceCode = editor.document.getText();
sourceFile = ts.createSourceFile(editor.document.fileName, sourceCode, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS);
  transformation = ts.transform(sourceFile, [organizeTransformer]);
  sourceCode = transformation.transformed[0].getFullText();
Arm answered 16/7, 2018 at 2:53 Comment(3)
Welcome to the the problem of "fidelity"printing an AST. If the output logic of the typescript engine isn't prepared to preserve the source format, then you'll get this kind of behavior. (Does it preserve comments?) See https://mcmap.net/q/83019/-compiling-an-ast-back-to-source-codeAlcove
Yes, the comments are still there. That's the thing. I don't really understand why some of the trivia gets preserved and other doesn't.Arm
That's a side effect of parsing to an AST; it must record enough trivia to allow the orginal text to be almost perfectly reproduced, including vertical whitespacing. (In practice, [almost] nobody cares if you reproduce the content of horizontal whitespace, unless you have tab characters whose tab settings aren't some universal standard).Alcove
C
1

A parser is not the best tool for code formatting:

  • It requires the input to be error free.
  • It usually skips whitespaces + comments, as they are not relevant for parsing.
  • The AST/parse tree represents the input structure in a way that is best suited for language processing, not so much for code generation.

In fact pretty printing doesn't need parsing at all. It's a source to source transformation and all what's needed is a lexer, to identify the various types of input elements (as they are relevant for formatting, in particular whitespaces + comments). You can see a way to implement a code formatter in my vscode extension vscode-antlr4. The principle is simple: collect source positions (not source text) for each non-white space element in a lists (including comments). Add the formatting whitespaces too. Then generate the new text from this list by copying the original text to the output. That avoids trouble with quoting, number radixes, comment types etc., which a parser might convert in a way that makes it easier for its processing, but doesn't necessarily represent the original form.

Calabria answered 16/7, 2018 at 6:36 Comment(2)
That was my plan B, but I wanted to see if this could be done purely by using AST. I managed to get it to work, except for the new lines.Arm
in other words: you need a *concrete* syntax tree parser, like antlr, tree-sitter, lezer-parser, ...Thetisa
S
7

Workaround:

  • replace empty lines with comment
  • transform
  • replace comments with empty line

    import {decodeEmptyLines, encodeEmptyLines} from 'ts-empty-line-encoder';
    
    let sourceCode = editor.document.getText();
    //encode empty lines
    sourceCode = encodeEmptyLines(sourceCode);
    const sourceFile = ts.createSourceFile(editor.document.fileName, sourceCode, ts.ScriptTarget.Latest, false, ts.ScriptKind.TS);
    const transformation = ts.transform(sourceFile, [organizeTransformer]);
    sourceCode = transformation.transformed[0].getFullText();
    //decode empty lines
    sourceCode = decodeEmptyLines(sourceCode);
    
Seve answered 14/10, 2019 at 18:42 Comment(0)
C
1

A parser is not the best tool for code formatting:

  • It requires the input to be error free.
  • It usually skips whitespaces + comments, as they are not relevant for parsing.
  • The AST/parse tree represents the input structure in a way that is best suited for language processing, not so much for code generation.

In fact pretty printing doesn't need parsing at all. It's a source to source transformation and all what's needed is a lexer, to identify the various types of input elements (as they are relevant for formatting, in particular whitespaces + comments). You can see a way to implement a code formatter in my vscode extension vscode-antlr4. The principle is simple: collect source positions (not source text) for each non-white space element in a lists (including comments). Add the formatting whitespaces too. Then generate the new text from this list by copying the original text to the output. That avoids trouble with quoting, number radixes, comment types etc., which a parser might convert in a way that makes it easier for its processing, but doesn't necessarily represent the original form.

Calabria answered 16/7, 2018 at 6:36 Comment(2)
That was my plan B, but I wanted to see if this could be done purely by using AST. I managed to get it to work, except for the new lines.Arm
in other words: you need a *concrete* syntax tree parser, like antlr, tree-sitter, lezer-parser, ...Thetisa

© 2022 - 2024 — McMap. All rights reserved.