I'm facing the same problem. My context is that I need to parse the AST in JSON format, and I'd like to get rid of all the headers and unnecessary files. I tried to replicate @textshell answer (https://mcmap.net/q/551382/-how-to-exclude-headers-from-ast-in-clang) but I noticed CLANG behaves differently in my case. The CLANG version I'm using is:
$ clang --version
Debian clang version 13.0.1-+rc1-1~exp4
Target: x86_64-pc-linux-gnu
Thread model: posix
To explain my case, let's consider the following example:
Both my_function
and main
are functions from the same source file (function_definition_invocation.c). However, it is only specified in the FunctionDecl
node of my_function
. I presume this behavior is due to the fact that both functions belong to the same file, and CLANG prints the file location only in the node belonging to it.
Once the first occurrence of the main file is found, every consecutive node should be added to the resulting, filtered JSON file. The code I'm using is:
def filter_ast_only_source_file(source_file, json_ast):
new_inner = []
first_occurrence_of_main_file = False
for entry in json_ast['inner']:
if not first_occurrence_of_main_file:
if entry.get('isImplicit', False):
continue
file_name = None
loc = entry.get('loc', {})
if 'file' in loc:
file_name = loc['file']
if 'expansionLoc' in loc:
if 'file' in loc['expansionLoc']:
file_name = loc['expansionLoc']['file']
if file_name != source_file:
continue
new_inner.append(entry)
first_occurrence_of_main_file = True
else:
new_inner.append(entry)
json_ast['inner'] = new_inner
And I call it like this:
generated_ast = subprocess.run(["clang", "-Xclang", "-ast-dump=json", source_file], capture_output=True) # Output is in bytes. In case it's needed, decode it to get string
# Parse the output into a JSON object
json_ast = json.loads(generated_ast.stdout)
filter_ast_only_source_file(source_file, json_ast)
So far it seems to be working.