Visualising C struct dependencies
Asked Answered
K

4

7

In a large C project there are many structs that have other structs, or pointers to them, as fields. I want to create a directed graph to show the dependencies between the "types". An example would be

typedef struct javaStat {
    int idNo;
    struct idIdentList *className;
    struct typeModifiers *thisType;
    struct symbol thisClass;
} ...

From this I would like to generate a DOT structure, which would look like

digraph {
    javaStat -> idIdentList
    javaStat -> typeModifiers
    javaStat -> symbol
}

or, using a DOT short-hand:

digraph {
    javaStat -> {idIdentList typeModifiers symbol}
}

Of course the first and last lines can be added by hand, so the primary problem is converting the struct references to the graph "pointer" lines.

At this point I'm content with a first level solution, meaning that deeper nesting could be ignored.

I first tried a simple grep struct *.h which produced something workable:

typedef struct javaStat {
    struct idIdentList *className;
    struct typeModifiers *thisType;
    struct symbol thisClass;
typedef struct <next struct> {

This is a simple problem which a few lines of Python would solve, but are there other handy solutions, perhaps using sed, grep, awk and their brethren?

EDIT: I've realized that the reason I want to do this is because I need to find one or more structures that are at the base of the "struct tree".

Knotts answered 11/3, 2020 at 16:20 Comment(6)
That will fail to notice a struct member whose type is an alias (typedef).Devries
@Devries yes, but I don't care about that at this point. But for an industrial strength solution you should probably go for actually parsing things, e.g. with pycparser.Knotts
Exactly my point, although I'd incline to clang python binding, since building a C parser from scratch seems like a lot of work. The questions you need to ask include (1) whether you will eventually want more precise information and (2) how much effort you should invest in an informal solution vs. in learning how to use a real C parsing library. But I suppose you knew that.Devries
I really doubt any solution involving sed, grep or awk could be defined "handy". It would probably be the exact opposite of "handy".Cesspool
@Devries pycparser is already a full C parser in python with node visitors. Haven't tried clangs, but pycparser has helped me a couple of times.Knotts
Ok, fair enough. I missed the c. So I guess we're more or less in sync here. (But do check out libclang. The documentation sucks, but the interface is simple enough for simple things.)Devries
K
1

Having tried the doxygen suggestion by @gavinb, augmented by @albert, which required some manipulation of the sources, and @Renats's suggestion to use Clangs python bindings, which was a bit to complicated for me at this time, I tried with pycparser.

Here's a link to that script in the project where I needed it.

Here is the first of two essential parts:

ast = parse_file(args[-1], use_cpp=True,
                 cpp_args=cpp_args + args[0:-1])
print("digraph {")
for node in (node for node in ast.ext if isinstance(node, c_ast.Typedef)):
    if isinstance(node.type.type, c_ast.Struct):
        node2dot(node)
print("}")

The main loop where pycparser parses the file to an AST which is then filtered to get only typedef's which are fed into node2dot which is in the following part:

def node2dot(node):
    if isinstance(node.type, c_ast.TypeDecl) and isinstance(node.type.type, c_ast.Struct):
        print("   ", node.type.type.name, "-> {", end="")
        if node.type.type.decls:  # has fields?
            for field in node.type.type.decls:
                if isstruct(field.type):
                    print("", struct_name_of(field.type), end="")
        print(" }")

def struct_name_of(node):
    if isinstance(node, c_ast.Struct):
        return node.name
    elif isinstance(node, c_ast.TypeDecl) or isinstance(node, c_ast.PtrDecl):
        return struct_name_of(node.type)

def isstruct(node):
    if isinstance(node, c_ast.Struct):
        return True
    elif isinstance(node, c_ast.TypeDecl) or isinstance(node, c_ast.PtrDecl):
        return isstruct(node.type)
Knotts answered 18/3, 2020 at 18:32 Comment(0)
M
2

Clang 9 allows for JSON representation of the AST of c file (found it in this question). JSON AST could be processed further to generate the target output.

E.g. this Python script:

#clang_ast_to_dot.py
from jsonpath_rw_ext import parse;
import sys, json;

def extract_struct_name(fieldDefinition):
  return fieldDefinition["type"]["qualType"].replace("struct", "").replace("*", "").replace(" ","")

def is_struct_field(fieldDefinition, knownStructs):
  return (fieldDefinition["kind"] == "FieldDecl" and 
          ("struct " in fieldDefinition["type"]["qualType"] or 
           extract_struct_name(fieldDefinition) in knownStructs))


data = json.load(sys.stdin)

allStructs = {}

for structDef in parse('$.inner[?(@.kind=="RecordDecl")]').find(data):
    allStructs[structDef.value["name"]]=structDef.value

print("digraph {")
for name, structDescription in allStructs.items():
    print("    %s -> {%s}"
          % (name, ", ".join(extract_struct_name(field) for field in structDescription["inner"] if is_struct_field(field, allStructs))))
print("}")

called as:

clang -Xclang -ast-dump=json MyCFile.c | python clang_ast_to_dot.py

produces:

digraph {
    javaStat -> {idIdentList, typeModifiers, symbol}
}

Of course this is a toy example, I'm sure it won't work for all cases.

Mindoro answered 12/3, 2020 at 0:33 Comment(4)
Is there any documentation on the clang AST somewhere?Knotts
Ok, found some here: releases.llvm.org/9.0.0/tools/clang/docs/…Knotts
Thanks, although a promising answer, as this is my first encounter with the clang AST, I have a hard time figuring out all the situations that need to be taken care of. Actual parsing has the drawback that it gives you all things the compiler sees, including millions of structs of various kinds and shapes in all the system headers. I think my initial Python hack gives me enough at this time.Knotts
Thanks for the feedback, good to know that this approach fails at the large scale. TBH I just made a guess and haven't try it with the real world example.Mindoro
L
2

I would start by running Doxygen over the codebase. It can easily be configured to create Dot graphs of your structures. There are enough quirks and corner cases involved in correctly parsing all this information and generating the correct output, you would save much time using an existing solution.

Limited answered 12/3, 2020 at 0:55 Comment(1)
Although this is a very good suggestion it doesn't really give me what I want. I realize that I didn't state that so I'll edit that into the question, but my goal is to find a dependency tree between the structures so that I know which of the structs are not dependent on any of the others.And I can't find any view that shows all structs.Knotts
K
1

Having tried the doxygen suggestion by @gavinb, augmented by @albert, which required some manipulation of the sources, and @Renats's suggestion to use Clangs python bindings, which was a bit to complicated for me at this time, I tried with pycparser.

Here's a link to that script in the project where I needed it.

Here is the first of two essential parts:

ast = parse_file(args[-1], use_cpp=True,
                 cpp_args=cpp_args + args[0:-1])
print("digraph {")
for node in (node for node in ast.ext if isinstance(node, c_ast.Typedef)):
    if isinstance(node.type.type, c_ast.Struct):
        node2dot(node)
print("}")

The main loop where pycparser parses the file to an AST which is then filtered to get only typedef's which are fed into node2dot which is in the following part:

def node2dot(node):
    if isinstance(node.type, c_ast.TypeDecl) and isinstance(node.type.type, c_ast.Struct):
        print("   ", node.type.type.name, "-> {", end="")
        if node.type.type.decls:  # has fields?
            for field in node.type.type.decls:
                if isstruct(field.type):
                    print("", struct_name_of(field.type), end="")
        print(" }")

def struct_name_of(node):
    if isinstance(node, c_ast.Struct):
        return node.name
    elif isinstance(node, c_ast.TypeDecl) or isinstance(node, c_ast.PtrDecl):
        return struct_name_of(node.type)

def isstruct(node):
    if isinstance(node, c_ast.Struct):
        return True
    elif isinstance(node, c_ast.TypeDecl) or isinstance(node, c_ast.PtrDecl):
        return isstruct(node.type)
Knotts answered 18/3, 2020 at 18:32 Comment(0)
C
0

An extension to the answer of @gavinb with a bit of an example.

Having a doxygen configuration file with EXTRACT_ALL = YES and HAVE_DOT=YES (and for more complex situations it might be useful to set DOT_GRAPH_MAX_NODES = to an appropriate value and set DOT_IMAGE_FORMAT = svg; also interesting might be UML_LOOK = YES).

I used a simple example:

typedef  struct idIdentList {
     int member;
};
typedef  struct typeModifiers {
     int member;
};
typedef  struct symbol {
     int member;
};
typedef  struct s1 {
     struct s2 member;
};
typedef  struct s2 {
     struct s3 member;
};
typedef  struct s3 {
     struct s4 member;
};
typedef  struct s4 {
     struct s5 member;
};
typedef  struct s5 {
     struct s6 member;
};
typedef  struct s6 {
     struct s6 member;
};
typedef struct javaStat {
    int idNo;
    struct idIdentList *className;
    struct typeModifiers *thisType;
    struct symbol thisClass;
    struct s1 member;
};

and from this I got:

enter image description here

Doxygen doesn't have a complete overview diagram, but with a little scripting one could create a "super struct" like (I also added here the struct not_refwhere there is no extra reference):

typedef struct super_script
{
  struct idIdentList a1;
  struct typeModifiers a2;
  struct symbol a3;
  struct s1 a4;
  struct s2 a5;
  struct s3 a6;
  struct s4 a7;
  struct s5 a8;
  struct s6 a9;
  struct javaStat a10;
  struct not_ref a11;
};

typedef struct not_ref
{
  int member;
};

Resulting in:

enter image description here

when you set DOT_CLEANUP = NO the used dot file will be available in the html directory

Cruelty answered 14/3, 2020 at 10:36 Comment(2)
Interesting. What do you mean by "a little scripting"? As in when do I apply that scripting and to what files? The doxygen output? Or extract/create that "superstruct" by extracting all the structs I have in my own files?Knotts
With "a little scripting" I mean as you wrote " extract/create that "superstruct" by extracting all the structs I have in my own files" before you run doxygen.Cruelty

© 2022 - 2024 — McMap. All rights reserved.