Borrow checker doesn't realize that `clear` drops reference to local variable
Asked Answered
J

6

17

The following code reads space-delimited records from stdin, and writes comma-delimited records to stdout. Even with optimized builds it's rather slow (about twice as slow as using, say, awk).

use std::io::BufRead;

fn main() {
    let stdin = std::io::stdin();
    for line in stdin.lock().lines().map(|x| x.unwrap()) {
        let fields: Vec<_> = line.split(' ').collect();
        println!("{}", fields.join(","));
    }
}

One obvious improvement would be to use itertools to join without allocating a vector (the collect call causes an allocation). However, I tried a different approach:

fn main() {
    let stdin = std::io::stdin();
    let mut cache = Vec::<&str>::new();
    for line in stdin.lock().lines().map(|x| x.unwrap()) {
        cache.extend(line.split(' '));
        println!("{}", cache.join(","));
        cache.clear();
    }
}

This version tries to reuse the same vector over and over. Unfortunately, the compiler complains:

error: `line` does not live long enough
 --> src/main.rs:7:22
  |
7 |         cache.extend(line.split(' '));
  |                      ^^^^
  |
note: reference must be valid for the block suffix following statement 1 at 5:39...
 --> src/main.rs:5:40
  |
5 |     let mut cache = Vec::<&str>::new();
  |                                        ^
note: ...but borrowed value is only valid for the for at 6:4
 --> src/main.rs:6:5
  |
6 |     for line in stdin.lock().lines().map(|x| x.unwrap()) {
  |     ^

error: aborting due to previous error

Which of course makes sense: the line variable is only alive in the body of the for loop, whereas cache keeps a pointer into it across iterations. But that error still looks spurious to me: since the cache is cleared after each iteration, no reference to line can be kept, right?

How can I tell the borrow checker about this?

Jarvis answered 9/10, 2016 at 17:27 Comment(7)
“no reference to line can be kept, right?” → right. But how does the borrowck knows that?Incest
I would note that line is an allocated String: each iteration would still requires a fresh memory allocation even if the Vec caching worked.Steelman
@mcarton: Exactly: that's why I'm asking how I can tell the borrow checker about it :)Thedathedric
@MatthieuM. I'd be curious to know a way to reuse that string's memory as well :) But it might be better as a separate question.Thedathedric
@Jarvis to avoid reallocating line every time, you should use read_line instead of linesIncest
@Incest Ooh, great idea :) Thanks!Thedathedric
Although interestingly, using read_line is measurably slower. I wonder why.Thedathedric
G
10

The only way to do this is to use transmute to change the Vec<&'a str> into a Vec<&'b str>. transmute is unsafe and Rust will not raise an error if you forget the call to clear here. You might want to extend the unsafe block up to after the call to clear to make it clear (no pun intended) where the code returns to "safe land".

use std::io::BufRead;
use std::mem;

fn main() {
    let stdin = std::io::stdin();
    let mut cache = Vec::<&str>::new();
    for line in stdin.lock().lines().map(|x| x.unwrap()) {
        let cache: &mut Vec<&str> = unsafe { mem::transmute(&mut cache) };
        cache.extend(line.split(' '));
        println!("{}", cache.join(","));
        cache.clear();
    }
}
Go answered 9/10, 2016 at 17:45 Comment(1)
This should have a safe abstraction I think, but I can't think of any existing one. For example recycler probably has the same lifetime restriction.Dardanelles
D
9

In this case Rust doesn't know what you're trying to do. Unfortunately, .clear() does not affect how .extend() is checked.

The cache is a "vector of strings that live as long as the main function", but in extend() calls you're appending "strings that live only as long as one loop iteration", so that's a type mismatch. The call to .clear() doesn't change the types.

Usually such limited-time uses are expressed by making a long-lived opaque object that enables access to its memory by borrowing a temporary object with the right lifetime, like RefCell.borrow() gives a temporary Ref object. Implementation of that would be a bit involved and would require unsafe methods for recycling Vec's internal memory.

In this case an alternative solution could be to avoid any allocations at all (.join() allocates too) and stream the printing thanks to Peekable iterator wrapper:

for line in stdin.lock().lines().map(|x| x.unwrap()) {
    let mut fields = line.split(' ').peekable();
    while let Some(field) = fields.next() {
        print!("{}", field);
        if fields.peek().is_some() {
            print!(",");
        }
    }
    print!("\n");
}

BTW: Francis' answer with transmute is good too. You can use unsafe to say you know what you're doing and override the lifetime check.

Dinothere answered 9/10, 2016 at 18:29 Comment(0)
D
5

Itertools has .format() for the purpose of lazy formatting, which skips allocating a string too.

use std::io::BufRead;
use itertools::Itertools;

fn main() {
    let stdin = std::io::stdin();
    for line in stdin.lock().lines().map(|x| x.unwrap()) {
        println!("{}", line.split(' ').format(","));
    }
}

A digression, something like this is a “safe abstraction” in the littlest sense of the solution in another answer here:

fn repurpose<'a, T: ?Sized>(mut v: Vec<&T>) -> Vec<&'a T> {
    v.clear();
    unsafe {
        transmute(v)
    }
}
Dardanelles answered 9/10, 2016 at 23:4 Comment(0)
D
2

Another approach is to refrain from storing references altogether, and to store indices instead. This trick can also be useful in other data structure contexts, so this might be a nice opportunity to try it out.

use std::io::BufRead;

fn main() {
    let stdin = std::io::stdin();
    let mut cache = Vec::new();
    for line in stdin.lock().lines().map(|x| x.unwrap()) {
        cache.push(0);
        cache.extend(line.match_indices(' ').map(|x| x.0 + 1));
        // cache now contains the indices where new words start

        // do something with this information
        for i in 0..(cache.len() - 1) {
            print!("{},", &line[cache[i]..(cache[i + 1] - 1)]);
        }
        println!("{}", &line[*cache.last().unwrap()..]);
        cache.clear();
    }
}

Though you made the remark yourself in the question, I feel the need to point out that there are more elegant methods to do this using iterators, that might avoid the allocation of a vector altogether.

The approach above was inspired by a similar question here, and becomes more useful if you need to do something more complicated than printing.

Disyllable answered 21/4, 2020 at 18:1 Comment(0)
O
0

Elaborating on Francis's answer about using transmute(), this could be safely abstracted, I think, with this simple function:

pub fn zombie_vec<'a, 'b, T: ?Sized>(mut data: Vec<&'a T>) -> Vec<&'b T> {
    data.clear();
    unsafe {
        std::mem::transmute(data)
    }
}

Using this, the original code would be:

fn main() {
    let stdin = std::io::stdin();
    let mut cache0 = Vec::<&str>::new();
    for line in stdin.lock().lines().map(|x| x.unwrap()) {
        let mut cache = cache0; // into the loop
        cache.extend(line.split(' '));
        println!("{}", cache.join(","));
        cache0 = zombie_vec(cache); // out of the loop
    }
}

You need to move the outer vector into every loop iteration, and restore it back to before you finish, while safely erasing the local lifetime.

Oft answered 22/4, 2020 at 10:21 Comment(0)
H
-1

The safe solution is to use .drain(..) instead of .clear() where .. is a "full range". It returns an iterator, so drained elements can be processed in a loop. It is also available for other collections (String, HashMap, etc.)

fn main() {
    let mut cache = Vec::<&str>::new();
    for line in ["first line allocates for", "second"].iter() {
        println!("Size and capacity: {}/{}", cache.len(), cache.capacity());
        cache.extend(line.split(' '));
        println!("    {}", cache.join(","));
        cache.drain(..);
    }
}
Histamine answered 28/2, 2020 at 21:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.