I want to analysis OCaml files (.ml) using OCaml. I want to break the files into Abstract Syntax Trees for analysis. I have attempted to use camlp4 but have had no luck. Has anyone else successfully done this before? Is this the best way to parse an OCaml file?
(I assume you know basic parts of OCaml already: how to write OCaml code, how to link modules and libraries, how to write build scripts and so on. If you do not, learn them first.)
The best way is to use the genuine OCaml code parser used in OCaml compiler itself, since it is 100% compatible by definition.
CamlP4 also implements OCaml parser but it is slightly incompatible with the genuine parser and the parse tree is somewhat specialized for writing syntax extensions: not very good for any other kind of analysis.
You may want to parse .ml
files with syntax extensions using P4. Even in this case, you should stick to the genuine parser: you can desugar the source code by P4 then send the result to your analyzer with the genuine parser.
To use OCaml compiler's parser, the easiest approach is to use compiler-libs.common OCamlFind package. It contains the parser and type checker of OCaml compiler.
Start from modifying driver/compile.ml
of OCaml compiler source, it implements the major compilation phases: calling preprocessor, parse, typing then code generation. To parse .ml
files you should modify (or simplify) Compile.implementation
. For .mli
files Compile.interface
.
Good luck.
Couldn't you use the -dparsetree option to the ocaml compiler?
hello.ml:
let _ = print_endline "Hello AST"
Now compile it:
$ ocamlc -dparsetree hello.ml
Which results in:
[
structure_item (hello.ml[1,0+0]..[1,0+33])
Pstr_eval
expression (hello.ml[1,0+8]..[1,0+33])
Pexp_apply
expression (hello.ml[1,0+8]..[1,0+21])
Pexp_ident "print_endline" (hello.ml[1,0+8]..[1,0+21])
[
<label> ""
expression (hello.ml[1,0+22]..[1,0+33])
Pexp_constant Const_string("Hello AST",None)
]
]
See also this blog post on -ppx extensions which has some info on extension point syntax extensions (the new way of writing syntax extensions in OCaml 4.02). There is info there on various AST manipulation modules.
© 2022 - 2024 — McMap. All rights reserved.