TL;DR: This is fine.
There are three potential problems here:
- Creating a reference with lifetime longer than its real lifetime.
- Moving the type may invalidate the references.
- While dropping the data we may use the strings after they're dropped.
And none of them is actually a problem. Let's start from the first.
Creating a reference with a lifetime longer than its actual lifetime is definitely fine. A lot of code in the wild relies of it to be fine, it is not listed under "Behavior considered undefined" in the reference (even though that list is not exhaustive), all current models for code execution in Rust (such as Stacked Borrows or MiniRust) employ this principle (in fact, the very reason for the existence of Stacked Borrows is to not rely on lifetimes for soundness and rather have more fine-grained model), and in UCG#231 it was stated that it is obvious that lifetimes cannot affect optimizations, just not specified currently somewhere.
So we're coming to the second issue. The question is whether moving StringCache
(and therefore, its storage
) will invalidate the references because it will move the Vec
's elements too. Or with more technical terms, whether moving a Vec
retags (a Stacked Borrows term) its elements, asserting their uniqueness.
Your intuition may say that this is fine, but reality is more complicated. Vec
defines its items with Unique
, which means that by the letter of the law moving it does invalidate all existing pointers to elements (this is true about Box
too). However, lots of code in the wild relies on this to be false, so at least for Vec
(and perhaps for Box
too) we probably want this to be false. I think it is okay to rely on that. Miri does not give Vec
any special treatement either, and as far as I'm aware, the compiler does not optimize based on it too.
For (3), your current definition (that declares table
before storage
) is obviously fine because it drops the HashMap
first, but even your previous definition (that declared storage
first) was fine, because HashMap
is declared with #[may_dangle]
, meaning it promises it will not access its elements (aside from dropping them) in its drop. This is also a stability guarantee, because it is observable, even in code very similar to yours:
use std::collections::HashMap;
#[derive(Default)]
struct StringCache<'a> {
storage: Vec<Box<str>>,
table: HashMap<&'a str, usize>,
}
fn main() {
let mut string_cache = StringCache::default();
string_cache.storage.push("hello".into());
string_cache.table.insert(&string_cache.storage[0], 0);
}
The fact that this code compiles successfully is because HashMap
promises to not touch its elements during drop. Otherwise, we could have use-after-free. So HashMap
cannot change this fact suddenly.
HashSet
instead ofVec
+HashMap
, as long as you don't care about order ofstr
s. – Shorthornfn lookup
– NecrophobiaPin<Box<str>>
instead of simpleBox
. There is no overhead, and it's exactly whatPin
was introduced for. – ShorthornPin
is relevant for the posed question here (would still need the unsafe transmute, could still callstorage.clear()
to get a dangling reference), but correct me if I'm wrong. – Necrophobiaindexmap
solves your problem? If you don't mind an external dependency? – Romancestorage
may be dropped beforetable
. Then iftable
drop implementation will use&'static str
somehow, it will cause UB. Even in this particular example there is no guarantee, thatHashMap
does not use keys somehow whe drops, or will not use in the future. – ShorthornHashMap
's drop is defined as#[may_dangle]
and this is observable thus guaranteed. – Alphonsa*mut str
) in theVec
and handle dropping of them yourself. – Alphonsa