How to convert source code to a xml based representation of the ast?
Asked Answered
C

5

7

i wanna get a xml representation of the ast of java and c code. 3 months ago, i asked this question yet but the solutions weren't comfortable for me

  • srcml seems to be a good solution for this problem but it does not support line numbers and columns but i need that feature.
  • about elsa: cite: "There is ongoing effort to export the Elsa AST as an XML document; we expect to be able to advertise this in the next public release."
  • dms... didn't understand that.
  • especially for java, there is javaml which supports line numbers. but the sourceforge page doesn't list any files.

question: there's software available which supports conversion of ast into xml which supports line numbers (and columns) [especially for java and c/c++]? is there an alternative to javaml and srcml?

ps: i don't wanne have parser generators. i hope to find a tool which can be used on the console typing: ./my-xml-generator Test.java [or something like that]... or a java implementation would be great too.

Cowey answered 12/5, 2010 at 10:26 Comment(2)
What is it that you want to do, that requires you to use XML?Methodius
srcML now supports line numbers and columns. From the website: "File and directory aware with metadata at the file level, i.e., language, file location, and version information." I have used srcML extensively and can verify it has line numbers and column information.Nagana
P
3

a bit late but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html

I have implemented it myself and it converts PHP and Java to XML. It's free so enjoy!

Oana.

Penknife answered 13/8, 2012 at 2:38 Comment(1)
And for 1000 lines of input Java, how big an XML document do you get?Methodius
M
2

What didn't you understand about DMS?

It exists.

It has compiler accurate parsers/frontends for C, C++, Java, C#, COBOL (and many other languages).

It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.

It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:

 run DMSDomainParser ++XML  <path_to_your_file>

You can see what such an XML result looks like for Java.

You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS XML output is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.

DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.

Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.

Full disclosure: I'm the architect behind DMS.

Methodius answered 14/5, 2010 at 1:36 Comment(0)
L
1

There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I haven't actually used it myself.

Leslileslie answered 12/5, 2010 at 10:28 Comment(1)
AFAIK, GCC-XML only dumps type definition data, not the code for the body of functions.Methodius
U
1

srcml supports line number and column number. Here is an example using a java file called input.java (keep in mind srcml supports multiple languages, including C/C++) that contains the following:

public class HelloWorld {
    public static void main(String[] args) {
        // Prints "Hello, World" to the terminal window.
        System.out.println("Hello, World");
    }
}

Then run srcml with the command to enable keeping track of this extra position information:

srcml input.java --position

It produces the following AST in an XML format with line number and column number embedded:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.srcML.org/srcML/src" xmlns:pos="http://www.srcML.org/srcML/position" revision="0.9.5" language="Java" filename="input.java" pos:tabs="8"><class><specifier pos:line="1" pos:column="1">public<pos:position pos:line="1" pos:column="7"/></specifier> class <name pos:line="1" pos:column="14">HelloWorld<pos:position pos:line="1" pos:column="24"/></name> <block pos:line="1" pos:column="25">{
    <function><specifier pos:line="2" pos:column="5">public<pos:position pos:line="2" pos:column="11"/></specifier> <specifier pos:line="2" pos:column="12">static<pos:position pos:line="2" pos:column="18"/></specifier> <type><name pos:line="2" pos:column="19">void<pos:position pos:line="2" pos:column="23"/></name></type> <name pos:line="2" pos:column="24">main<pos:position pos:line="2" pos:column="28"/></name><parameter_list pos:line="2" pos:column="28">(<parameter><decl><type><name><name pos:line="2" pos:column="29">String<pos:position pos:line="2" pos:column="35"/></name><index pos:line="2" pos:column="35">[]<pos:position pos:line="2" pos:column="37"/></index></name></type> <name pos:line="2" pos:column="38">args<pos:position pos:line="2" pos:column="42"/></name></decl></parameter>)<pos:position pos:line="2" pos:column="43"/></parameter_list> <block pos:line="2" pos:column="44">{
    <comment type="line" pos:line="3" pos:column="9">// Prints "Hello, World" to the terminal window.</comment>
    <expr_stmt><expr><call><name><name pos:line="4" pos:column="9">System<pos:position pos:line="4" pos:column="15"/></name><operator pos:line="4" pos:column="15">.<pos:position pos:line="4" pos:column="16"/></operator><name pos:line="4" pos:column="16">out<pos:position pos:line="4" pos:column="19"/></name><operator pos:line="4" pos:column="19">.<pos:position pos:line="4" pos:column="20"/></operator><name pos:line="4" pos:column="20">println<pos:position pos:line="4" pos:column="27"/></name></name><argument_list pos:line="4" pos:column="27">(<argument><expr><literal type="string" pos:line="4" pos:column="28">"Hello, World"<pos:position pos:line="4" pos:column="42"/></literal></expr></argument>)<pos:position pos:line="4" pos:column="43"/></argument_list></call></expr>;<pos:position pos:line="4" pos:column="44"/></expr_stmt>
    }<pos:position pos:line="5" pos:column="6"/></block></function>
}<pos:position pos:line="6" pos:column="2"/></block></class></unit>

Reference: Documentation for srcml v0.9.5 (see srcml --help). I also use srcml frequently, including this feature to obtain position information.

Ugaritic answered 14/9, 2017 at 0:59 Comment(0)
C
0

Only for Java, you can use BeautyJ.

You can launch it against your file with -xml.* options. For example:

java /your/dir/BeautyJ/lib/beautyj.jar beautyj -xml.out= -xml.doctype your_file.java

...and you get an XML representation of that file (and included ones).

BTW: the "-xml.out=" options specify an output file. Used in that way, with the trailing "=", it output to STDOUT. It's not an error.

Cheke answered 19/9, 2012 at 13:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.