Why are explicit lifetimes needed in Rust?
Asked Answered
M

11

282

I was reading the lifetimes chapter of the Rust book, and I came across this example for a named/explicit lifetime:

struct Foo<'a> {
    x: &'a i32,
}

fn main() {
    let x;                    // -+ x goes into scope
                              //  |
    {                         //  |
        let y = &5;           // ---+ y goes into scope
        let f = Foo { x: y }; // ---+ f goes into scope
        x = &f.x;             //  | | error here
    }                         // ---+ f and y go out of scope
                              //  |
    println!("{}", x);        //  |
}                             // -+ x goes out of scope

It's quite clear to me that the error being prevented by the compiler is the use-after-free of the reference assigned to x: after the inner scope is done, f and therefore &f.x become invalid, and should not have been assigned to x.

My issue is that the problem could have easily been analyzed away without using the explicit 'a lifetime, for instance by inferring an illegal assignment of a reference to a wider scope (x = &f.x;).

In which cases are explicit lifetimes actually needed to prevent use-after-free (or some other class?) errors?

Modlin answered 24/7, 2015 at 11:15 Comment(2)
This was cross posted to RedditAcacia
For future readers of this question, please note it links to the first edition of the book and there's now a second edition :)Saline
A
283

The other answers all have salient points (fjh's concrete example where an explicit lifetime is needed), but are missing one key thing: why are explicit lifetimes needed when the compiler will tell you you've got them wrong?

This is actually the same question as "why are explicit types needed when the compiler can infer them". A hypothetical example:

fn foo() -> _ {  
    ""
}

Of course, the compiler can see that I'm returning a &'static str, so why does the programmer have to type it?

The main reason is that while the compiler can see what your code does, it doesn't know what your intent was.

Functions are a natural boundary to firewall the effects of changing code. If we were to allow lifetimes to be completely inspected from the code, then an innocent-looking change might affect the lifetimes, which could then cause errors in a function far away. This isn't a hypothetical example. As I understand it, Haskell has this problem when you rely on type inference for top-level functions. Rust nipped that particular problem in the bud.

There is also an efficiency benefit to the compiler — only function signatures need to be parsed in order to verify types and lifetimes. More importantly, it has an efficiency benefit for the programmer. If we didn't have explicit lifetimes, what does this function do:

fn foo(a: &u8, b: &u8) -> &u8

It's impossible to tell without inspecting the source, which would go against a huge number of coding best practices.

by inferring an illegal assignment of a reference to a wider scope

Scopes are lifetimes, essentially. A bit more clearly, a lifetime 'a is a generic lifetime parameter that can be specialized with a specific scope at compile time, based on the call site.

are explicit lifetimes actually needed to prevent [...] errors?

Not at all. Lifetimes are needed to prevent errors, but explicit lifetimes are needed to protect what little sanity programmers have.

Acacia answered 24/7, 2015 at 13:37 Comment(15)
"As I understand it, Haskell has this problem when you rely on type inference for top-level functions." - Very interesting. I have some understanding of Haskell, could you provide an example?Modlin
@jco Imagine you have some top-level function f x = x + 1 without a type signature that you're using in another module. If you later change the definition to f x = sqrt $ x + 1, its type changes from Num a => a -> a to Floating a => a -> a, which will cause type errors at all the call sites where f is called with e.g. an Int argument. Having a type signature ensures that errors occur locally.Devonadevondra
"Scopes are lifetimes, essentially. A bit more clearly, a lifetime 'a is a generic lifetime parameter that can be specialized with a specific scope at call time. " Wow that's a really great, illuminating point. I'd like it if it was included in the book this explicitly.Modlin
@Devonadevondra Thanks. Just to see if I grok it -- the point is that if the type was explicitly stated before adding sqrt $, only a local error would have occurred after the change, and not a lot of errors in other places (which is much better if we didn't want to change the actual type)?Modlin
@jco Exactly. Not specifying a type means that you can accidentally change the interface of a function. That's one of the reasons that it is strongly encouraged to annotate all top-level items in Haskell.Devonadevondra
Also if a function receives two references and returns a reference then it might sometimes return the first reference and sometimes the second one. In this case it is impossible to infer a lifetime for the returned reference. Explicit lifetimes help to avoid/clarify such a situation.Gaudery
About this sentence: "The main reason is that while the compiler can see what your code does, it doesn't know what your intent was." I'd like to add that the following is another "main reason": It is much easier for the compiler to check that a set of user provided lifetime annotations are correct than it is for it to independently derived the lifetimes that make some code work. The latter problem might be completely impossible in many important cases, if the code is more complex than the simple example in the question.Voletta
“The main reason is that while the compiler can see what your code does, it doesn't know what your intent was.” What is this mean?...Pavyer
"then an innocent-looking change might affect the lifetimes, which could then cause errors in a function far away." is there an example?Pavyer
"As I understand it, Haskell has this problem when you rely on type inference for top-level functions. Rust nipped that particular problem in the bud." This make it more confusing as beginners like me do not know Haskell. So this is helpless to the answer.Pavyer
"This is actually the same question as "why are explicit types needed when the compiler can infer them". " I'd actually say "why are explicit types needed when the compiler can check them" is a better match. Especially in a post-lifetime-elision world where the compiler does "infer" a lot of lifetime. Even in languages with non-local type inference it's common for explicit types to be required because the compiler is not able to infer them, or because you want something different than what it inferred.Malodorous
can you give us an example of an innocent-looking change ?Nostomania
@Nostomania play.rust-lang.org/…Acacia
You could also have "local compiler errors" with inferred lifetime when – for each error message – the compiler would add the source location where the error-causing requirement originated from; of course limiting the origin to the same crate, not going down the APIs and other crates. If you insist on explicit aliasing between inputs and output without IDE help, then the easier and better way would have been, if Rust could treat parameter names as temporary type parameters (indicating lifetime and type of the parameter) within the return type, e.g. fn foo<T> ( x: &T, y: &T) -> Union<x,y>.Weinberg
EDIT: or rather Union<&x,&y> or Union<&'x,&'y> or why not a general "lifeof(varname)" operator, denoted as an apostrophe after the ampersand, that can be with any reference? structs don't have return types but I wonder whether Rust could introduce implicit lifetimes for struct references to be new lifetimes, using explicit ones when reusing existing ones? It could have looked like this struct Foo<T> { x : & T, y : &'x , z : &'x i32 } where y has lifetime and type of x, z has lifetime of x and type i32.Weinberg
D
129

Let's have a look at the following example.

fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'a u32 {
    x
}

fn main() {
    let x = 12;
    let z: &u32 = {
        let y = 42;
        foo(&x, &y)
    };
}

Here, the explicit lifetimes are important. This compiles because the result of foo has the same lifetime as its first argument ('a), so it may outlive its second argument. This is expressed by the lifetime names in the signature of foo. If you switched the arguments in the call to foo the compiler would complain that y does not live long enough:

error[E0597]: `y` does not live long enough
  --> src/main.rs:10:5
   |
9  |         foo(&y, &x)
   |              - borrow occurs here
10 |     };
   |     ^ `y` dropped here while still borrowed
11 | }
   | - borrowed value needs to live until here
Devonadevondra answered 24/7, 2015 at 11:52 Comment(2)
The compiler do not run the function and do not know which (x or y) is returned so compiler can not figure out the lifetime of the returned value.Pavyer
@Pavyer Borrow checker does branch based program analysis, so it does know the lifetime of returned value. And it will raise a compile error if function signature doesn't match with the returned lifetime.Pedicle
L
29

The lifetime annotation in the following structure:

struct Foo<'a> {
    x: &'a i32,
}

specifies that a Foo instance shouldn't outlive the reference it contains (x field).

The example you came across in the Rust book doesn't illustrate this because f and y variables go out of scope at the same time.

A better example would be this:

fn main() {
    let f : Foo;
    {
        let n = 5;  // variable that is invalid outside this block
        let y = &n;
        f = Foo { x: y };
    };
    println!("{}", f.x);
}

Now, f really outlives the variable pointed to by f.x.

Lenz answered 25/7, 2015 at 10:47 Comment(0)
G
13

Note that there are no explicit lifetimes in that piece of code, except the structure definition. The compiler is perfectly able to infer lifetimes in main().

In type definitions, however, explicit lifetimes are unavoidable. For example, there is an ambiguity here:

struct RefPair(&u32, &u32);

Should these be different lifetimes or should they be the same? It does matter from the usage perspective, struct RefPair<'a, 'b>(&'a u32, &'b u32) is very different from struct RefPair<'a>(&'a u32, &'a u32).

Now, for simple cases, like the one you provided, the compiler could theoretically elide lifetimes like it does in other places, but such cases are very limited and do not worth extra complexity in the compiler, and this gain in clarity would be at the very least questionable.

Glamorize answered 24/7, 2015 at 11:51 Comment(6)
Can you explain why they are very different?Rushing
@Rushing The second requires that both references share the same lifetime. This means refpair.1 cannot live longer than refpair.2 and vice versa – so both refs need to point to something with the same owner. The first however only requires that the RefPair outlives both its parts.Kawasaki
@A.B., it compiles because both lifetimes are unified - because local lifetimes are smaller that 'static, 'static can be used everywhere where local lifetimes can be used, therefore in your example p will have its lifetime parameter inferred as the local lifetime of y.Glamorize
@Rushing RefPair<'a>(&'a u32, &'a u32) means that 'a will be the intersection of the both input lifetimes, i.e. in this case the lifetime of y.Devonadevondra
@Kawasaki "requires that the RefPair outlives both its parts"? I though it was the opposite... a &u32 can still make sense without the RefPair, while a RefPair with its refs dead would be strange.Compote
@Kawasaki This would be an example: gist.github.com/kindlychung/0641fd3a380768fe47a515a0f9541815Compote
G
9

If a function receives two references as arguments and returns a reference, then the implementation of the function might sometimes return the first reference and sometimes the second one. It is impossible to predict which reference will be returned for a given call. In this case, it is impossible to infer a lifetime for the returned reference, since each argument reference may refer to a different variable binding with a different lifetime. Explicit lifetimes help to avoid or clarify such a situation.

Likewise, if a structure holds two references (as two member fields) then a member function of the structure may sometimes return the first reference and sometimes the second one. Again explicit lifetimes prevent such ambiguities.

In a few simple situations, there is lifetime elision where the compiler can infer lifetimes.

Gaudery answered 24/4, 2016 at 11:59 Comment(0)
M
7

I've found another great explanation here: http://doc.rust-lang.org/0.12.0/guide-lifetimes.html#returning-references.

In general, it is only possible to return references if they are derived from a parameter to the procedure. In that case, the pointer result will always have the same lifetime as one of the parameters; named lifetimes indicate which parameter that is.

Modlin answered 2/8, 2015 at 19:41 Comment(0)
K
6

The case from the book is very simple by design. The topic of lifetimes is deemed complex.

The compiler cannot easily infer the lifetime in a function with multiple arguments.

Also, my own optional crate has an OptionBool type with an as_slice method whose signature actually is:

fn as_slice(&self) -> &'static [bool] { ... }

There is absolutely no way the compiler could have figured that one out.

Kawasaki answered 24/7, 2015 at 11:55 Comment(2)
IINM, inferring the lifetime of the return type of a two-argument function will be equivalent to the halting problem - IOW, not decidable in a finite amount of time.Kelson
"The compiler cannot easily infer the lifetime in a function with multiple arguments." - Unless the first argument is &self or &mut self - then lifetime of this reference is assigned to all elided output lifetimes.Decidua
A
3

As a newcomer to Rust, my understanding is that explicit lifetimes serve two purposes.

  1. Putting an explicit lifetime annotation on a function restricts the type of code that may appear inside that function. Explicit lifetimes allow the compiler to ensure that your program is doing what you intended.

  2. If you (the compiler) want(s) to check if a piece of code is valid, you (the compiler) will not have to iteratively look inside every function called. It suffices to have a look at the annotations of functions that are directly called by that piece of code. This makes your program much easier to reason about for you (the compiler), and makes compile times managable.

On point 1., Consider the following program written in Python:

import pandas as pd
import numpy as np

def second_row(ar):
    return ar[0]

def work(second):
    df = pd.DataFrame(data=second)
    df.loc[0, 0] = 1

def main():
    # .. load data ..
    ar = np.array([[0, 0], [0, 0]])

    # .. do some work on second row ..
    second = second_row(ar)
    work(second)

    # .. much later ..
    print(repr(ar))

if __name__=="__main__":
    main()

which will print

array([[1, 0],
       [0, 0]])

This type of behaviour always surprises me. What is happening is that df is sharing memory with ar, so when some of the content of df changes in work, that change infects ar as well. However, in some cases this may be exactly what you want, for memory efficiency reasons (no copy). The real problem in this code is that the function second_row is returning the first row instead of the second; good luck debugging that.

Consider instead a similar program written in Rust:

#[derive(Debug)]
struct Array<'a, 'b>(&'a mut [i32], &'b mut [i32]);

impl<'a, 'b> Array<'a, 'b> {
    fn second_row(&mut self) -> &mut &'b mut [i32] {
        &mut self.0
    }
}

fn work(second: &mut [i32]) {
    second[0] = 1;
}

fn main() {
    // .. load data ..
    let ar1 = &mut [0, 0][..];
    let ar2 = &mut [0, 0][..];
    let mut ar = Array(ar1, ar2);

    // .. do some work on second row ..
    {
        let second = ar.second_row();
        work(second);
    }

    // .. much later ..
    println!("{:?}", ar);
}

Compiling this, you get

error[E0308]: mismatched types
 --> src/main.rs:6:13
  |
6 |             &mut self.0
  |             ^^^^^^^^^^^ lifetime mismatch
  |
  = note: expected type `&mut &'b mut [i32]`
             found type `&mut &'a mut [i32]`
note: the lifetime 'b as defined on the impl at 4:5...
 --> src/main.rs:4:5
  |
4 |     impl<'a, 'b> Array<'a, 'b> {
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...does not necessarily outlive the lifetime 'a as defined on the impl at 4:5
 --> src/main.rs:4:5
  |
4 |     impl<'a, 'b> Array<'a, 'b> {
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^

In fact you get two errors, there is also one with the roles of 'a and 'b interchanged. Looking at the annotation of second_row, we find that the output should be &mut &'b mut [i32], i.e., the output is supposed to be a reference to a reference with lifetime 'b (the lifetime of the second row of Array). However, because we are returning the first row (which has lifetime 'a), the compiler complains about lifetime mismatch. At the right place. At the right time. Debugging is a breeze.

Agglutinin answered 10/8, 2018 at 14:54 Comment(0)
W
1

The reason why your example does not work is simply because Rust only has local lifetime and type inference. What you are suggesting demands global inference. Whenever you have a reference whose lifetime cannot be elided, it must be annotated.

Whither answered 28/5, 2018 at 21:51 Comment(0)
K
1

I think of a lifetime annotation as a contract about a given ref been valid in the receiving scope only while it remains valid in the source scope. Declaring more references in the same lifetime kind of merges the scopes, meaning that all the source refs have to satisfy this contract. Such annotation allow the compiler to check for the fulfillment of the contract.

Keown answered 9/6, 2020 at 18:43 Comment(0)
B
1

It boils down to compiler performance.

Rust compiler is looking only at the function signature, not its body. That is why we explicitly state the relation between input lifetime and output lifetime.

fn longest_string<'a>(x: &'a str, y: &str) -> &'a str {
    x
}

fn main() {
    let string1 = "abcdef";
    let string2 = "xyz";
    let result;

    result = longest_string(&string1, &string2);

    println!("The longest string is {}", result);
    println!("The longest string is {}", result);
}

Detail: In the longest_string function, we are returning a reference from the function, and that reference refers to some data (the data within x). Even though in the implementation of longest_string we always return x, the Rust compiler is looking only at the function signature, not its body, to determine what guarantees are being made about the lifetimes of the references.

Braden answered 24/6, 2023 at 4:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.