How do I use the Rust parser (libsyntax) myself?
Asked Answered
B

4

11

I want to use the Rust parser (libsyntax) to parse a Rust file and extract information like function names out of it. I started digging in the docs and code, so my first goal is a program that prints all function names of freestanding functions in a .rs file.

The program should expand all macros before it prints the function names, so functions declared via macro aren't missed. That's why I can't write some crappy little parser by myself to do the job.

I have to admit that I'm not yet perfectly good at programming Rust, so I apologize in advance for any stupid statements in this question.

How I understood it I need to do the following steps:

  1. Parse the file via the Parser struct
  2. Expand macros with MacroExpander
  3. ???
  4. Use a Visitor to walk the AST and extract the information I need (eg. via visit_fn)

So here are my questions:

  1. How do I use MacroExpander?
  2. How do I walk the expanded AST with a custom visitor?

I had the idea of using a custom lint check instead of a fully fledged parser. I'm investigating this option.

If it matters, I'm using rustc 0.13.0-nightly (f168c12c5 2014-10-25 20:57:10 +0000)

Bear answered 26/10, 2014 at 16:51 Comment(1)
Extraction of interesting facts usually requires lots more than "just a parser" See my article on "Life After Parsing" (google or via bio page).Secretion
A
6

I'm afraid I can't answer your question directly; but I can present an alternative that might help.

If all you need is the AST, you can retrieve it in JSON format using rustc -Z ast-json. Then use your favorite language (Python is great) to process the output.

You can also get pretty-printed source using rustc --pretty=(expanded|normal|typed).

For example, given this hello.rs:

fn main() {
    println!("hello world");
}

We get:

$ rustc -Z ast-json hello.rs
{"module":{"inner":null,"view_items":[{"node":{"va... (etc.)
$ rustc --pretty=normal hello.rs
#![no_std]
#[macro_use]
extern crate "std" as std;
#[prelude_import]
use std::prelude::v1::*;
fn main() { println!("hello world"); }
$ rustc --pretty=expanded hello.rs
#![no_std]
#[macro_use]
extern crate "std" as std;
#[prelude_import]
use std::prelude::v1::*;
fn main() {
    ::std::io::stdio::println_args(::std::fmt::Arguments::new({
                                                                  #[inline]
                                                                  #[allow(dead_code)]
                                                                  static __STATIC_FMTSTR:
                                                                         &'static [&'static str]
                                                                         =
                                                                      &["hello world"];
                                                                  __STATIC_FMTSTR
                                                              },
                                                              &match () {
                                                                   () => [],
                                                               }));
}

If you need more than that though, a lint plugin would be the best option. Properly handling macro expansion, config flags, the module system, and anything else that comes up is quite non-trivial. With a lint plugin, you get the type-checked AST right away without fuss. Cargo supports compiler plugins too, so your tool will fit nicely into other people's projects.

Adscititious answered 13/1, 2015 at 9:59 Comment(1)
thank you! I too feel like not using unstable compiler internals is the way to go here!Bear
R
7

You can use syntex to parse Rust, so you don't need to use unstable Rust.

Here's a simple example:

// Tested against syntex_syntax v0.33
extern crate syntex_syntax as syntax;

use std::rc::Rc;
use syntax::codemap::{CodeMap};
use syntax::errors::{Handler};
use syntax::errors::emitter::{ColorConfig};
use syntax::parse::{self, ParseSess};

fn main() {
    let codemap = Rc::new(CodeMap::new());
    let tty_handler =
        Handler::with_tty_emitter(ColorConfig::Auto, None, true, false, codemap.clone());
    let parse_session = ParseSess::with_span_handler(tty_handler, codemap.clone());

    let src = "fn foo(x: i64) { let y = x + 1; return y; }".to_owned();

    let result = parse::parse_crate_from_source_str(String::new(), src, Vec::new(), &parse_session);
    println!("parse result: {:?}", result);
}

This prints the whole AST:

parse result: Ok(Crate { module: Mod { inner: Span { lo: BytePos(0), hi: BytePos(43), expn_id: ExpnId(4294967295) },
items: [Item { ident: foo#0, attrs: [], id: 4294967295, node: Fn(FnDecl { inputs: [Arg { ty: type(i64), pat:
pat(4294967295: x), id: 4294967295 }], output: Default(Span { lo: BytePos(15), hi: BytePos(15), expn_id: ExpnId(4294967295) }),
variadic: false }, Normal, NotConst, Rust, Generics { lifetimes: [], ty_params: [], where_clause: WhereClause { id:
4294967295, predicates: [] } }, Block { stmts: [stmt(4294967295: let y = x + 1;), stmt(4294967295: return y;)], expr:
None, id: 4294967295, rules: Default, span: Span { lo: BytePos(15), hi: BytePos(43), expn_id: ExpnId(4294967295) } }),
vis: Inherited, span: Span { lo: BytePos(0), hi: BytePos(43), expn_id: ExpnId(4294967295) } }] }, attrs: [], config: [],
span: Span { lo: BytePos(0), hi: BytePos(42), expn_id: ExpnId(4294967295) }, exported_macros: [] })
Rimmer answered 14/8, 2016 at 17:9 Comment(1)
I think syntex won't be continued. issueBear
A
6

I'm afraid I can't answer your question directly; but I can present an alternative that might help.

If all you need is the AST, you can retrieve it in JSON format using rustc -Z ast-json. Then use your favorite language (Python is great) to process the output.

You can also get pretty-printed source using rustc --pretty=(expanded|normal|typed).

For example, given this hello.rs:

fn main() {
    println!("hello world");
}

We get:

$ rustc -Z ast-json hello.rs
{"module":{"inner":null,"view_items":[{"node":{"va... (etc.)
$ rustc --pretty=normal hello.rs
#![no_std]
#[macro_use]
extern crate "std" as std;
#[prelude_import]
use std::prelude::v1::*;
fn main() { println!("hello world"); }
$ rustc --pretty=expanded hello.rs
#![no_std]
#[macro_use]
extern crate "std" as std;
#[prelude_import]
use std::prelude::v1::*;
fn main() {
    ::std::io::stdio::println_args(::std::fmt::Arguments::new({
                                                                  #[inline]
                                                                  #[allow(dead_code)]
                                                                  static __STATIC_FMTSTR:
                                                                         &'static [&'static str]
                                                                         =
                                                                      &["hello world"];
                                                                  __STATIC_FMTSTR
                                                              },
                                                              &match () {
                                                                   () => [],
                                                               }));
}

If you need more than that though, a lint plugin would be the best option. Properly handling macro expansion, config flags, the module system, and anything else that comes up is quite non-trivial. With a lint plugin, you get the type-checked AST right away without fuss. Cargo supports compiler plugins too, so your tool will fit nicely into other people's projects.

Adscititious answered 13/1, 2015 at 9:59 Comment(1)
thank you! I too feel like not using unstable compiler internals is the way to go here!Bear
W
2

The syn crate works indeed. At the beginning I wrongly think it is for writing procedural macros (as its readme suggests), but indeed it can parse a source code file. Please look at this page: https://docs.rs/syn/1.0.77/syn/struct.File.html . It even gives an example that inputs a .rs file and output AST (of course, you can do anything with it - not just printing):

use std::env;
use std::fs::File;
use std::io::Read;
use std::process;

fn main() {
    let mut args = env::args();
    let _ = args.next(); // executable name

    let filename = match (args.next(), args.next()) {
        (Some(filename), None) => filename,
        _ => {
            eprintln!("Usage: dump-syntax path/to/filename.rs");
            process::exit(1);
        }
    };

    let mut file = File::open(&filename).expect("Unable to open file");

    let mut src = String::new();
    file.read_to_string(&mut src).expect("Unable to read file");

    let syntax = syn::parse_file(&src).expect("Unable to parse file");

    // Debug impl is available if Syn is built with "extra-traits" feature.
    println!("{:#?}", syntax);
}

Thanks @poolie for pointing out this hint (though lacking a bit of details).

Waziristan answered 1/10, 2021 at 9:42 Comment(0)
A
0

syntex seems to no longer be maintained (last updated 2017), but https://crates.io/crates/syn may do what you need.

Anticipant answered 8/9, 2021 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.