How can I create hygienic identifiers in code generated by procedural macros?
Asked Answered
B

2

9

When writing a declarative (macro_rules!) macro, we automatically get macro hygiene. In this example, I declare a variable named f in the macro and pass in an identifier f which becomes a local variable:

macro_rules! decl_example {
    ($tname:ident, $mname:ident, ($($fstr:tt),*)) => {
        impl std::fmt::Display for $tname {
            fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
                let Self { $mname } = self;
                write!(f, $($fstr),*)
            }
        }
    }
}

struct Foo {
    f: String,
}

decl_example!(Foo, f, ("I am a Foo: {}", f));

fn main() {
    let f = Foo {
        f: "with a member named `f`".into(),
    };
    println!("{}", f);
}

This code compiles, but if you look at the partially-expanded code, you can see that there's an apparent conflict:

impl std::fmt::Display for Foo {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let Self { f } = self;
        write!(f, "I am a Foo: {}", f)
    }
}

I am writing the equivalent of this declarative macro as a procedural macro, but do not know how to avoid potential name conflicts between the user-provided identifiers and identifiers created by my macro. As far as I can see, the generated code has no notion of hygiene and is just a string:

src/main.rs

use my_derive::MyDerive;

#[derive(MyDerive)]
#[my_derive(f)]
struct Foo {
    f: String,
}

fn main() {
    let f = Foo {
        f: "with a member named `f`".into(),
    };
    println!("{}", f);
}

Cargo.toml

[package]
name = "example"
version = "0.1.0"
edition = "2018"

[dependencies]
my_derive = { path = "my_derive" }

my_derive/src/lib.rs

extern crate proc_macro;

use proc_macro::TokenStream;
use quote::quote;
use syn::{parse_macro_input, DeriveInput, Meta, NestedMeta};

#[proc_macro_derive(MyDerive, attributes(my_derive))]
pub fn my_macro(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input as DeriveInput);

    let name = input.ident;

    let attr = input.attrs.into_iter().filter(|a| a.path.is_ident("my_derive")).next().expect("No name passed");
    let meta = attr.parse_meta().expect("Unknown attribute format");
    let meta = match meta {
        Meta::List(ml) => ml,
        _ => panic!("Invalid attribute format"),
    };
    let meta = meta.nested.first().expect("Must have one path");
    let meta = match meta {
        NestedMeta::Meta(Meta::Path(p)) => p,
        _ => panic!("Invalid nested attribute format"),
    };
    let field_name = meta.get_ident().expect("Not an ident");

    let expanded = quote! {
        impl std::fmt::Display for #name {
            fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
                let Self { #field_name } = self;
                write!(f, "I am a Foo: {}", #field_name)
            }
        }
    };

    TokenStream::from(expanded)
}

my_derive/Cargo.toml

[package]
name = "my_derive"
version = "0.1.0"
edition = "2018"

[lib]
proc-macro = true

[dependencies]
syn = "1.0.13"
quote = "1.0.2"
proc-macro2 = "1.0.7"

With Rust 1.40, this produces the compiler error:

error[E0599]: no method named `write_fmt` found for type `&std::string::String` in the current scope
 --> src/main.rs:3:10
  |
3 | #[derive(MyDerive)]
  |          ^^^^^^^^ method not found in `&std::string::String`
  |
  = help: items from traits can only be used if the trait is in scope
  = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
  |
1 | use std::fmt::Write;
  |

What techniques exist to namespace my identifiers from identifiers outside of my control?

Bucharest answered 6/1, 2020 at 19:59 Comment(2)
Obvious idea (don't know if it works): write a proc macro that generates a declarative one, then calls it?Breena
The Lisp term for this utility is gensym, and apparently there is at least one crate for that. However the implementation is exactly the same as in French's answer.Delaunay
C
9

Summary: you can't yet use hygienic identifiers with proc macros on stable Rust. Your best bet is to use a particularly ugly name such as __your_crate_your_name.


You are creating identifiers (in particular, f) by using quote!. This is certainly convenient, but it's just a helper around the actual proc macro API the compiler offers. So let's take a look at that API to see how we can create identifiers! In the end we need a TokenStream, as that's what our proc macro returns. How can we construct such a token stream?

We can parse it from a string, e.g. "let f = 3;".parse::<TokenStream>(). But this was basically an early solution and is discouraged now. In any case, all identifiers created this way behave in a non-hygienic manner, so this won't solve your problem.

The second way (which quote! uses under the hood) is to create a TokenStream manually by creating a bunch of TokenTrees. One kind of TokenTree is an Ident (identifier). We can create an Ident via new:

fn new(string: &str, span: Span) -> Ident

The string parameter is self explanatory, but the span parameter is the interesting part! A Span stores the location of something in the source code and is usually used for error reporting (in order for rustc to point to the misspelled variable name, for example). But in the Rust compiler, spans carry more than location information: the kind of hygiene! We can see two constructor functions for Span:

  • fn call_site() -> Span: creates a span with call site hygiene. This is what you call "unhygienic" and is equivalent to "copy and pasting". If two identifiers have the same string, they will collide or shadow each other.

  • fn def_site() -> Span: this is what you are after. Technically called definition site hygiene, this is what you call "hygienic". The identifiers you define and the ones of your user live in different universes and won't ever collide. As you can see in the docs, this method is still unstable and thus only usable on a nightly compiler. Bummer!

There are no really great workarounds. The obvious one is to use a really ugly name like __your_crate_some_variable. To make it a bit easier for you, you can create that identifier once and use it within quote! (slightly better solution here):

let ugly_name = quote! { __your_crate_some_variable };
quote! {
    let #ugly_name = 3;
    println!("{}", #ugly_name);
}

Sometimes you can even search through all identifiers of the user that could collide with yours and then simply algorithmically chose an identifier that does not collide. This is actually what we did for auto_impl, with a fallback super ugly name. This was mainly to improve the generated documentation from having super ugly names in it.

Apart from that, I'm afraid you cannot really do anything.

Carniola answered 6/1, 2020 at 21:10 Comment(1)
You mention that building the result TokenStream by parsing a string is now discouraged. Can you link to a document that outlines the preferred technique (either in a comment, or in an edit to your answer)? The few articles I have read either use .parse(), or they don't explain how to generate the TokenStream at all.Jamboree
T
5

You can thanks to a UUID:

fn generate_unique_ident(prefix: &str) -> Ident {
    let uuid = uuid::Uuid::new_v4();
    let ident = format!("{}_{}", prefix, uuid).replace('-', "_");

    Ident::new(&ident, Span::call_site())
}
Talapoin answered 6/1, 2020 at 20:56 Comment(5)
Does anything prevent the user from passing in an identifier that (un)luckily matches the identifier that I've generated?Bucharest
@Bucharest The laws of probability I guessTalapoin
@Bucharest It's an astronomically improbable event, since a v4 UUID consists of 128 random bits. With a correctly seeded PRNG, it should be akin to asking whether your git repo could be broken by two commits unluckily hashing to the same SHA1.Delaunay
Could that break incremental recompilation sometimes? The idea of a random event in compilation scares me a bit.Adelinaadelind
@LouisGarczynski You're right, I didn't think about that. Maybe you can hash some information instead, like the file name, line, column, etc… It should be better regarding that point.Talapoin

© 2022 - 2024 — McMap. All rights reserved.