Taming Lifetime Contagion with Rust Structs
Asked Answered
I

1

6

I am attempting to define a struct, in Rust, that contains a member of type async_executor::LocalExecutor, a type that, itself, is generic over a lifetime, 'a:

pub struct LocalExecutor<'a> {
    inner: ......<Executor<'a>>,
    ...
    ...
}

My own struct must now, apparently, also be generic over a lifetime, 'a, which means nothing to it, itself -- that lifetime is a detail of async_executor::LocalExecutor.

#[cfg(all(test, not(target = "wasm32")))]
struct MockThing<'a> {
    executor: async_executor::LocalExecutor<'a>,
}

My struct exists only when building unit-tests, where I need a mock, single-threaded executor for running async code. Herein lies the problem: the only consumer of my struct uses #[cfg(...)] conditional compilation, internally, to

  1. use my mock when compiling for unit-tests (and not WebAssembly)
  2. or one real implementation when compiling for WebAssembly
  3. or another real implementation, otherwise.

This is done with conditional compilation to ensure that the consumer, itself, is not unnecessarily generic, which would polute its public API and just push the problem of contagious generics to everything that consumes it -- a large number of things. Conditional compilation provides compile-time duck-typing, of a sort, and because that conditional compilation only exists in one place, knowledge of the implementation details is unnecessary for everyone else -- as it should be.

Neither implementation 2 nor 3 requires generic lifetimes but, because the mock one (1) must be generic over 'a, I now have to make everything, throughout my code-base, generic over some lifetime, 'a'! (and faff about with PhantomData to stop the compiler from complaining that 'a is meaningless, which it is, most of the time.)

Is there some way that I could define my mock struct in a way that does not strike this problem? It would be really convenient if I could use '_ in the member definition, like...

#[cfg(test)]
struct MockThing {
    executor: async_executor::LocalExecutor<'_>,
}

... to indicate that the lifetime of executor should be deduced from the lifetime of the MockThing. (Of course, this does not work.)

I suppose I could also just use another async runtime, with ambient executor, for my unit tests, and bypass the problem but that would not help me to understand what is going on, here, and, in general, how one should encapsulate lifetimes as implementation details, in structs that have members that are generic over some lifetime.

There is something I am not understanding, though: why Executor (inside LocalExecutor) must be generic over 'a -- it does not contain a reference with lifetime 'a -- and why they use PhantomData to ensure that it is generic over an invariant lifetime, 'a, and, even, what lifetime invariance means in this case. I've been reading about it, in the nomicon and elsewhere, but it will be many days of learning before I can say I understand lifetime variance and all I want to do is "put one of those in my struct".

Surely there must be some way to tame lifetime contagion and prevent it polluting one's entire code-base just because one type is generic over a lifetime? Help, please!

Immunize answered 4/11, 2021 at 13:20 Comment(5)
I think you can simply set this lifetime to 'static in this case. As you noted, it's not really the lifetime of any reference inside the executor. It's rather the "scope" of the iterator, which allows to submit tasks with non-static lifetime to it. Since it looks like you don't want to do that, simply using 'static should solve your problem.Subsidize
These phantom lifetimes are sometimes used to guarantee that one object outlives the other, even though they to not reference one another directly. Or it can be used to avoid the Executor being modified while the LocalExecutor exists, as it is kept borrowed.Mooncalf
@Mooncalf In this case, the phantom lifetime is used to allow running futures that are not 'static but only 'a on the executor. This makes sure the future lives at least as long as the executor.Subsidize
I hereby nominate "lifetime contagion" to be the term of art for this phenomenon. (I believe @Immunize coined the term just now, as this question is the only result if you google it.)Shellbark
@Shellbark : I'll happily take the credit as the creator of the term "lifetime contagion" but I'm pretty damn sure that it was only the logical extrapolation of things I'd read from other projects' authors. I think the seed of the term came from talks in gfx-rs / wgpu-rs land. But it does feature prominintly in my Rust notes, Trello board and code-base comments: LIFETIME CONTAGION. Also, perhaps "generic contagion" is more accurate.Immunize
G
2

TL;DR If you don't need the executor's futures to refer to local data in the executor's surroundings, you should just use 'static:

#[cfg(test)]
struct MockThing {
    executor: LocalExecutor<'static>
}

Both Executor and LocalExecutor have a lifetime in order to allow the futures they run to borrow data from an outside environment. For example, this compiles and runs as expected:

// local data
let greeting = "foo".to_owned();

// executor
let local_ex = LocalExecutor::new();

// spawn a future that references local data
let handle = local_ex.spawn(async {
    println!("Hello {}", greeting);
});
future::block_on(local_ex.run(handle));

// data still alive
println!("done {}", greeting);

LocalExecutor (like its cousin Executor) tracks the lifetime of the lending values, and statically prove that no borrow will outlive the value. This is what the lifetime 'a on its struct means: it represents an intersection of the scopes of the values borrowed by the futures submitted to the executor.

With the exception of the 'static lifetime, you cannot explicitly specify a lifetime 'foo and name it when constructing LocalExecutor::<'foo>::new(). Instead, the lifetime gets deduced automatically, in this case to the scope of greeting, and only gets named in the types and functions that receive it. This is like the type of a closure, which is unnamed when you declare a closure, but gets a name when a generic function accepts it as a T: Fn(). Analogously, the caller can't specify a lifetime on LocalExecutor, but LocalExecutor<'a> sees it as 'a.

Now let's try the same with tokio:

let greeting = "foo".to_owned();

let runtime = tokio::runtime::Runtime::new().unwrap();
let handle = runtime.spawn(async {
    println!("Hello {}", greeting);
});
runtime.block_on(handle);
drop(runtime);  // greeting outlives runtime

println!("done {}", greeting);

The above code is obviously sound because greeting outlives the runtime, and yet it fails to compile:

error[E0373]: async block may outlive the current function, but it borrows `greeting`, which is owned by the current function
 --> src/main.rs:6:38
  |
6 |       let handle = runtime.spawn(async {
  |  ______________________________________^
7 | |         println!("Hello {}", greeting);
  | |                              -------- `greeting` is borrowed here
8 | |     });
  | |_____^ may outlive borrowed value `greeting`
  |
  = note: async blocks are not executed immediately and must either take a reference or ownership of outside variables they use
help: to force the async block to take ownership of `greeting` (and any other referenced variables), use the `move` keyword
  |
6 |     let handle = runtime.spawn(async move {
  |                                      ++++

tokio does not allow any outside borrows in any of its futures.

They must satisfy the 'static bound, meaning that a future must not contain references to anything from its outside environment, except for 'static data. (They may also own any data they choose, which is why the compiler suggests a move - except in that case the last println!() would fail to compile because greeting would be gone.)

If you don't need to borrow from local context, just use 'static as lifetime:

#[cfg(test)]
struct MockThing {
    executor: LocalExecutor<'static>
}

...and you'll be no worse off than with tokio. No lifetime contagion, at the "cost" of futures only being allowed to own values (which is fine for a large number of use cases, as witnessed by tokio accepting the limitation).

It would be really convenient if I could use '_ in the member definition, [...] to indicate that the lifetime of executor should be deduced from the lifetime of the MockThing.

That's not how lifetimes work. A lifetime 'a is a scope, a set of source code lines in the caller's environment, and that's not something the parent struct can provide (yet). A non-static lifetime has to be connected to a local object in the surrounding environment.

In the first code snippet above, the lifetime of the LocalExecutor is automatically deduced to the lifetime of the greeting local variable. If we had borrowed multiple variables, the lifetime would be the lifetime of the shortest-lived one. If we had borrowed ones whose scopes don't overlap, we'd get a compilation error.

Going answered 4/11, 2021 at 20:19 Comment(2)
Fantastic reply; thank you. I'll delve into it in more detail and try to digest Matsakis' talk on Polonius and then accept your reply as the answer, if I'm satisfied.Immunize
@Immunize Note that the talk on Polonius is largely unrelated to the issue here, but the "yet" link was just too cool to miss up. This recent question is a much better example of perfectly valid code that the current borrow checker rejects, and which Polonius will accept (without any modifications to the code).Going

© 2022 - 2024 — McMap. All rights reserved.