Is it legal to store a 'static reference to struct internal data in unsafe Rust?
Asked Answered
N

1

6

I have the following data structure (simplified):

use std::collections::HashMap;
pub struct StringCache {
    // the hashmap keys point to elements stored in `storage`
    // so never outlive this datastructure. the 'static refs are never handed out
    table: HashMap<&'static str, usize>,

    storage: Vec<Box<str>>,
}

Is it legal/defined behaviour to "lie" to Rust about the lifetime of the references here? It feels like violating the type system to me. Is the public API of this data structure sound? For completeness, here is the full implementation:

use std::mem::transmute;
impl StringCache {
    pub fn intern(&mut self, entry: &str) -> usize {
        if let Some(key) = self.table.get(entry) {
            return *key;
        }
        let idx = self.storage.len();
        self.storage.push(entry.to_owned().into_boxed_str());
        // we cast our refs to 'static here.
        let key = unsafe { transmute::<&str, &'static str>(&self.storage[idx]) };
        self.table.insert(key, idx);
        idx
    }
    pub fn lookup(&self, idx: usize) -> &str {
        &self.storage[idx]
    }
}
Necrophobia answered 30/5, 2023 at 12:11 Comment(16)
Related: Why can't I store a value and a reference to that value in the same struct?Dugan
I think the whole your example may be solved just by using HashSet instead of Vec + HashMap, as long as you don't care about order of strs.Shorthorn
@cafce25: I won't argue with related, but it doesn't answer the question. The referenced data here is behind a Box, and will live as long as the struct / reference does.Necrophobia
@Dmitry: I care about the order, see fn lookupNecrophobia
Then I think it's better to use Pin<Box<str>> instead of simple Box. There is no overhead, and it's exactly what Pin was introduced for.Shorthorn
@Dmitry: I don't think Pin is relevant for the posed question here (would still need the unsafe transmute, could still call storage.clear() to get a dangling reference), but correct me if I'm wrong.Necrophobia
Maybe indexmap solves your problem? If you don't mind an external dependency?Romance
@isaactfa: this is tagged language-lawyer. I want to know if the code as posed is legal. But thanks for the cool crate recommendation! :)Necrophobia
Fair enough :) I'm curious too. Transmuting lifetimes is very rarely sound.Romance
Does this answer your question? Is it Ok to have a reference with incorrect (larger than correct) lifetime in scope?Dugan
Also in general I think it is UB, but in this particular example seems to be not. The thing is that drop order is not defined, so storage may be dropped before table. Then if table drop implementation will use &'static str somehow, it will cause UB. Even in this particular example there is no guarantee, that HashMap does not use keys somehow whe drops, or will not use in the future.Shorthorn
@Dmitry: drop order is definedNecrophobia
FWIW, miri doesn't report undefined behaviour: playgroundOrthodontics
@Shorthorn And even if drop order was undefined, HashMap's drop is defined as #[may_dangle] and this is observable thus guaranteed.Alphonsa
If you want to be extra-extra-safe, you can store raw pointers (*mut str) in the Vec and handle dropping of them yourself.Alphonsa
@Romance You could argue that it is unsound in the sense that a future library change (maybe in a fork) can cause undefined behaviour with only safe code by letting a reference with incorrect lifetime escape.Ostrander
A
8

TL;DR: This is fine.


There are three potential problems here:

  1. Creating a reference with lifetime longer than its real lifetime.
  2. Moving the type may invalidate the references.
  3. While dropping the data we may use the strings after they're dropped.

And none of them is actually a problem. Let's start from the first.

Creating a reference with a lifetime longer than its actual lifetime is definitely fine. A lot of code in the wild relies of it to be fine, it is not listed under "Behavior considered undefined" in the reference (even though that list is not exhaustive), all current models for code execution in Rust (such as Stacked Borrows or MiniRust) employ this principle (in fact, the very reason for the existence of Stacked Borrows is to not rely on lifetimes for soundness and rather have more fine-grained model), and in UCG#231 it was stated that it is obvious that lifetimes cannot affect optimizations, just not specified currently somewhere.

So we're coming to the second issue. The question is whether moving StringCache (and therefore, its storage) will invalidate the references because it will move the Vec's elements too. Or with more technical terms, whether moving a Vec retags (a Stacked Borrows term) its elements, asserting their uniqueness.

Your intuition may say that this is fine, but reality is more complicated. Vec defines its items with Unique, which means that by the letter of the law moving it does invalidate all existing pointers to elements (this is true about Box too). However, lots of code in the wild relies on this to be false, so at least for Vec (and perhaps for Box too) we probably want this to be false. I think it is okay to rely on that. Miri does not give Vec any special treatement either, and as far as I'm aware, the compiler does not optimize based on it too.

For (3), your current definition (that declares table before storage) is obviously fine because it drops the HashMap first, but even your previous definition (that declared storage first) was fine, because HashMap is declared with #[may_dangle], meaning it promises it will not access its elements (aside from dropping them) in its drop. This is also a stability guarantee, because it is observable, even in code very similar to yours:

use std::collections::HashMap;

#[derive(Default)]
struct StringCache<'a> {
    storage: Vec<Box<str>>,
    table: HashMap<&'a str, usize>,
}

fn main() {
    let mut string_cache = StringCache::default();
    string_cache.storage.push("hello".into());
    string_cache.table.insert(&string_cache.storage[0], 0);
}

The fact that this code compiles successfully is because HashMap promises to not touch its elements during drop. Otherwise, we could have use-after-free. So HashMap cannot change this fact suddenly.

Alphonsa answered 30/5, 2023 at 16:4 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.