In a tree-sitter grammar, how do I match strings except for reserved keywords in identifiers?
Asked Answered
G

1

7

This might be related to me not understanding the Keyword Extraction feature, which from the docs seems to be about avoiding an issue where no space exists between a keyword and the following expression. But say I have a fairly standard identifier regex for variable names, function names, etc.:

/\w*[A-Za-z]\w*/

How do I keep this from matching a reserved keyword like IF or ELSE or something like that? So this expression would produce an error:

int IF = 5;

while this would not:

int x = 5;

Garnetgarnett answered 21/3, 2021 at 14:36 Comment(2)
see tree-sitter.github.io/tree-sitter/… and the sections that followJallier
@Jallier does not address the question.Garnetgarnett
G
1

There is a pull request pending since 2019 to add an EXCLUDE feature, but this is not currently implemented as of time of writing this (April 2021 - if some time has passed and you're reading this, please do re-check this!). And since treesitter also does not support negative lookbehind in its regular expressions, this has to be handled at the semantic level. One thing you can do to make this check easier is to enumerate all your reserved words then add them as an alternative to your identifier regex:

keyword: $ => choice('IF', 'THEN', 'ELSE'),

name: $ => /\w*[A-Za-z]\w*/,

identifier: $ => choice($.keyword, $.name)

According to rule 4 of treesitter's match rules, in the expression int IF = 5; the IF token would match (identifier keyword) rather than (identifier name) since it is a more specific match. This means you can do an easy query for illegal (identifier keyword) nodes and surface the error to the user in your language server or from wherever it is you're using the treesitter grammar.

Note that this approach does run the risk of creating many conflicts between your (identifier keyword) match and the actual language constructs that use those keywords. If so, you'll have to handle the whole thing at the semantic level: scan all identifiers to check whether they're a reserved word.

Garnetgarnett answered 5/4, 2021 at 16:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.