Why is the source location end off by two characters for a statement ending in a semicolon?
Asked Answered
M

2

2

I'm trying to write a source to source translator using libTooling.

I'm using ASTMatchers to try to find if statements that don't have curly braces and then use a rewriter to add the braces.

The matcher I'm using is:

ifStmt(unless(hasDescendant(compoundStmt())))

Then I just get the start and end locations, and rewrite the curly braces.

Here's the source code for that:

if (const IfStmt *IfS = Result.Nodes.getNodeAs<clang::IfStmt>("ifStmt")) {
const Stmt *Then = IfS->getThen();
Rewrite.InsertText(Then->getLocStart(), "{", true, true);
Rewrite.InsertText(Then->getLocEnd(),"}",true,true);

Now the problem is that for some reason the end location is always off by 2 characters. Why is this so?

Mcmillon answered 19/5, 2016 at 10:27 Comment(2)
I tried using InsertTextAfterToken instead of InserText. It always missed the semicolon so now it's off by only one.Mcmillon
There is discussion of this problem on the LLVM Discourse at discourse.llvm.org/t/extend-stmt-with-proper-end-location/54745Transposition
T
0

This is a general issue with the Clang AST: it usually does not record the location of the final semicolon of a statement that ends in one. See discussion Extend Stmt with proper end location? on the LLVM Discourse server.

To solve this problem, the usual approach is to start with the end location as stored in the AST, then use the Lexer class to advance forward until the semicolon is found. This is not 100% reliable because there can be intervening macros and preprocessing directives, but fortunately that is uncommon for the final semicolon of a statement.

There is an example of doing this in clang::arcmt::trans::findSemiAfterLocation in the Clang source code. The essence is these lines:

  // Lex from the start of the given location.
  Lexer lexer(SM.getLocForStartOfFile(locInfo.first),
              Ctx.getLangOpts(),
              file.begin(), tokenBegin, file.end());
  Token tok;
  lexer.LexFromRawLexer(tok);
  if (tok.isNot(tok::semi)) {
    if (!IsDecl)
      return SourceLocation();
    // Declaration may be followed with other tokens; such as an __attribute,
    // before ending with a semicolon.
    return findSemiAfterLocation(tok.getLocation(), Ctx, /*IsDecl*/true);
  }
Transposition answered 24/8, 2023 at 17:46 Comment(0)
M
2

the SourceLocation i was getting is off by one because it only matches the token and ";" is not part of that. btw, if anybody's wondering how to include the ";" into the range if they want to, you could just use Lexer::MeasureTokenLength and then add that by one and get the new SourceLocaiton by offset.

Mcmillon answered 16/6, 2016 at 9:38 Comment(1)
This answer assumes that the semicolon follows immediately after the preceding token, with no intervening whitespace, which is not true in general.Transposition
T
0

This is a general issue with the Clang AST: it usually does not record the location of the final semicolon of a statement that ends in one. See discussion Extend Stmt with proper end location? on the LLVM Discourse server.

To solve this problem, the usual approach is to start with the end location as stored in the AST, then use the Lexer class to advance forward until the semicolon is found. This is not 100% reliable because there can be intervening macros and preprocessing directives, but fortunately that is uncommon for the final semicolon of a statement.

There is an example of doing this in clang::arcmt::trans::findSemiAfterLocation in the Clang source code. The essence is these lines:

  // Lex from the start of the given location.
  Lexer lexer(SM.getLocForStartOfFile(locInfo.first),
              Ctx.getLangOpts(),
              file.begin(), tokenBegin, file.end());
  Token tok;
  lexer.LexFromRawLexer(tok);
  if (tok.isNot(tok::semi)) {
    if (!IsDecl)
      return SourceLocation();
    // Declaration may be followed with other tokens; such as an __attribute,
    // before ending with a semicolon.
    return findSemiAfterLocation(tok.getLocation(), Ctx, /*IsDecl*/true);
  }
Transposition answered 24/8, 2023 at 17:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.