How can I create newtypes for an unsized type and its owned counterpart (like `str` and `String`) in safe Rust?
Asked Answered
I

1

11

I want to create a pair of newtypes Tag(str) and TagBuf(String), analogous to how Path and PathBuf wrap OsStr and OsString. My end goal is to have a map keyed by TagBuf and to be able to index into it with just a Tag:

fn main() {
    let mut m: HashMap<TagBuf, i32> = HashMap::new();
    m.insert(TagBuf("x".to_string()), 1);
    assert_eq!(m.get(Tag::new("x")), Some(&1));
}

But I’m running into issues because Tag is dynamically sized.

Specifically, implementing Borrow<Tag> for TagBuf is tricky:

pub struct Tag(str);
pub struct TagBuf(String);

impl std::borrow::Borrow<Tag> for TagBuf {
    fn borrow(&self) -> &Tag {
        let s: &str = self.0.as_str();
        // How can I turn `&str` into `&Tag`? A naive attempt fails:
        &Tag(*s)
    }
}
error[E0277]: the size for values of type `str` cannot be known at compilation time
 --> src/lib.rs:8:10
  |
8 |         &Tag(*s)
  |          ^^^ doesn't have a size known at compile-time
  |
  = help: the trait `Sized` is not implemented for `str`
  = note: all function arguments must have a statically known size

I can just return unsafe { std::mem::transmute(s) } with a #[repr(transparent)] annotation, but I would like to avoid unsafe code.

I’ve looked at the source for Path/PathBuf and come up with the following:

use std::borrow::Borrow;
use std::ops::Deref;

#[repr(transparent)]
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Tag(str);
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone)]
pub struct TagBuf(String);

impl Tag {
    fn new<S: AsRef<str> + ?Sized>(s: &S) -> &Tag {
        unsafe { &*(s.as_ref() as *const str as *const Tag) }
    }
}

impl Deref for TagBuf {
    type Target = Tag;
    fn deref(&self) -> &Tag {
        Tag::new(&self.0)
    }
}

impl Borrow<Tag> for TagBuf {
    fn borrow(&self) -> &Tag {
        self.deref()
    }
}

impl ToOwned for Tag {
    type Owned = TagBuf;
    fn to_owned(&self) -> TagBuf {
        TagBuf(self.0.to_owned())
    }
}

fn main() {
    let mut m = std::collections::HashMap::<TagBuf, i32>::new();
    m.insert(TagBuf("x".to_string()), 1);
    assert_eq!(m.get(Tag::new("x")), Some(&1));
}

…and this works, and I can understand it (good!), but it still uses unsafe for that cast, which I’d like to avoid.

I saw the Rustonomicon section on exotically sized types, which doesn’t use unsafe, but the unsizing coercion seems complicated, and I don’t see how to adapt it from [u8] to str, since there’s no stringy counterpart to [u8; N].

I also read the implementation of Rc<str>, which seems to do some more unsafe conversion via Rc<[u8]> and some specialization magic that I had trouble understanding.

I’ve read some related questions, like:

…but I haven’t found an answer.

Does latest stable Rust have a way to define a newtype pair for str and String in safe code? If not, are there RFCs or tracking issues that I should follow?

Itu answered 23/11, 2020 at 22:31 Comment(9)
I'm not sure you can wrap str directly. &str is the more common form, probably for this reason.Whitening
AFAIK, you currently do need to use unsafe (or a crate that uses unsafe) to do this.Inca
Yeah I'm also curious why you're using Tag(str) instead of Tag<'a>(&'a str).Girasol
@PitaJ: It’s a fair question, but how would you implement Borrow<Tag<?>> for TagBuf, which is required to use it as a map key? Implementing impl<'a> Borrow<Tag<'a>> for TagBuf doesn’t seem right: (a) that would let you get a &'static str from a stack-owned TagBuf, and (b) you’d need to implement fn borrow<'a>(&'s self) -> &'s Tag<'a>, which doesn’t look possible (for good reason).Itu
Path::new() does use unsafe, so it might be required.Outpour
@Shepmaster: Thanks for the references! Both answers in “avoid temporary allocations” incur runtime cost (Cow<'_, str>, &(dyn Key + '_)). While the FFI question is related and the best answers may end up being similar, the questions are sufficiently different. str and c_void have quite different meanings and representations (&str is a fat pointer; &c_void is not), and the lack of an FFI boundary simplifies things quite a lot. A canonical answer to this question would be useful to many more people than an answer to that one, so this question should stand.Itu
saying void * is not a fat pointer is not true, actually it's whatever according to the C standard. it's an opaque type that "can allow other pointer to be cast into it and can do the reverse operation". There is some system where void * is very special.Twila
@Stargateur: Sure; what I meant is that std::mem::size_of::<&str>() is 16 on my system, whereas std::mem::size_of::<&c_void>() is 8.Itu
Just another FYI about the new() implementation required to get &unsizeds into Newtype(unsized), not the question itself: For anyone else who hates writing boilerplate (and unsafe boilerplate doubly so), the opaque_typedef crate provides a derive macro to do this part for you (except it's called from_inner instead of new; it also gives you as_inner to get the &unsized out of the newtype struct again). Might save some people some time.Ulrica
P
5

This cannot be solved in safe Rust without some small overhead.

This is how I'd solve it using unsafe:

use std::{borrow::Borrow, ops::Deref};

#[repr(transparent)]
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct Tag(str);

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Clone)]
pub struct TagBuf(String);

impl Tag {
    fn new<S: AsRef<str> + ?Sized>(s: &S) -> &Tag {
        unsafe { &*(s.as_ref() as *const str as *const Tag) }
    }
}

impl Deref for TagBuf {
    type Target = Tag;
    
    fn deref(&self) -> &Tag {
        Tag::new(&self.0)
    }
}

impl Borrow<Tag> for TagBuf {
    fn borrow(&self) -> &Tag {
        self.deref()
    }
}

impl ToOwned for Tag {
    type Owned = TagBuf;
    
    fn to_owned(&self) -> TagBuf {
        TagBuf(self.0.to_owned())
    }
}

use std::collections::HashMap;

fn main() {
    let mut m = HashMap::new();
    m.insert(TagBuf("x".to_string()), 1);
    assert_eq!(m.get(Tag::new("x")), Some(&1));
}

See also:

Plutonium answered 24/11, 2020 at 16:37 Comment(2)
Thanks for the second pair of eyes! Glad to see that my code is identical to what you came up with. :-) I’ll accept this, but will happily accept another answer if it becomes possible in safe Rust.Itu
I tried to apply this approach for a more generic case, but failed - if you have some suggestions to #76873904 ?Trimmer

© 2022 - 2024 — McMap. All rights reserved.