Is it safe to clone a type-erased Arc via raw pointer?
Asked Answered
T

1

6

I'm in a situation where I'm working with data wrapped in an Arc, and I sometimes end up using into_raw to get the raw pointer to the underlying data. My use case also calls for type-erasure, so the raw pointer often gets cast to a *const c_void, then cast back to the appropriate concrete type when re-constructing the Arc.

I've run into a situation where it would be useful to be able to clone the Arc without needing to know the concrete type of the underlying data. As I understand it, it should be safe to reconstruct the Arc with a dummy type solely for the purpose of calling clone, so long as I never actually dereference the data. So, for example, this should be safe:

pub unsafe fn clone_raw(handle: *const c_void) -> *const c_void {
    let original = Arc::from_raw(handle);
    let copy = original.clone();
    mem::forget(original);
    Arc::into_raw(copy)
}

Is there anything that I'm missing that would make this actually unsafe? Also, I assume the answer would apply to Rc as well, but if there are any differences please let me know!

Torietorii answered 9/1, 2020 at 20:30 Comment(4)
clone increments the refcount, and from_raw, into_raw and mem::forget do not touch it, so the effect of calling clone_raw is just to increase the refcount by 1. I assume this is just a dummy example (i.e. in your real code you would do something other than mem::forget with original)?Deliberative
(I think this is safe in the sense of "very unlikely to do anything bad", I'm just trying to make sure I'm not overlooking something weird)Deliberative
yeah your example is poor, this is safe but unlikely to do what you want. In fact, what do you want ?Constitutional
In the example I gave, I need to do mem::forget(original) so that it isn't dropped at the end of the function, which would decrement the ref count. The other option would be to do Arc::into_raw(original) instead, as shown in this example.Torietorii
S
5

This is almost always unsafe.

An Arc<T> is just a pointer to a heap-allocated struct which roughly looks like

struct ArcInner<T: ?Sized> {
    strong: atomic::AtomicUsize,
    weak: atomic::AtomicUsize,
    data: T,  // You get a raw pointer to this element
}

into_raw() gives you a pointer to the data element. The implementation of Arc::from_raw() takes such a pointer, assumes that it's a pointer to the data-element in an ArcInner<T>, walks back in memory and assumes to find an ArcInner<T> there. This assumption depends on the memory-layout of T, specifically it's alignment and therefore it's exact placement in ArcInner.

If you call into_raw() on an Arc<U> and then call from_raw() as if it was an Arc<V> where U and V differ in alignment, the offset-calculation of where U/V is in ArcInner will be wrong and the call to .clone() will corrupt the data structure. Dereferencing T is therefore not required to trigger memory unsafety.

In practice, this might not be a problem: Since data is the third element after two usize-elements, most T will probably be aligned the same way. However, if the stdlib-implementation changes or you end up compiling for a platform where this assumption is wrong, reconstructing an Arc<V>::from_raw that was created by an Arc<U> where the memory layout of V and U is different will be unsafe and crash.


Update:

Having thought about it some more I downgrade my vote from "might be safe, but cringy" to "most likely unsafe" because I can always do

#[repr(align(32))]
struct Foo;

let foo = Arc::new(Foo);

In this example Foo will be aligned to 32 bytes, making ArcInner<Foo> 32 bytes in size (8+8+16+0) while a ArcInner<()> is just 16 bytes (8+8+0+0). Since there is no way to tell what the alignment of T is after the type has been erased, there is no way to reconstruct a valid Arc.

There is an escape hatch that might be safe in practice: By wrapping T into another Box, the layout of ArcInner<T> is always the same. In order to force this upon any user, you can do something like

struct ArcBox<T>(Arc<Box<T>>)

and implement Deref on that. Using ArcBox instead of Arc forces the memory layout of ArcInner to always be the same, because T is behind another pointer. This, however, means that all access to T requires a double dereference, which might badly affect performance.

Sigma answered 10/1, 2020 at 9:18 Comment(8)
Great answer. I added the T: ?Sized bound to your reproduction of ArcInner because that is what ensures that data will always be at the end of the layout of ArcInner -- without it, or repr(C), Rust would be able to place data wherever it wanted (which would mean all bets were off, even for a T less strictly aligned than usize).Deliberative
Thank you for the thorough answer! I suspected this wouldn't work the way I wanted, but your answer makes it clear why.Torietorii
How does rust find the arc that it previously mem::forget about?Vannavannatta
@Vannavannatta Every Arc is essentially just a std::ptr::NonNull<ArcInner<T>>. So as long as there is at least one such pointer left and as long as the reference counter in ArcInner<T> matches the number of Arc<T> that will be dropped, other Arc<T> can be re-constructed and everything will be fine. Only the very last Arc<T> to be dropped also deallocates the ArcInner<T>.Sigma
@Sigma l don't see how that answers my question. I asked how previously allocated Arc is found in Arc::from_raw.Vannavannatta
@user1685095, the Arc is not allocated at all, it's just a pointer stored in a register or on the stack. When an Arc is dropped normally, it deferences its inner pointer to ArcInner (which is allocated on the heap), decrements the reference count stored there, and deallocates the ArcInner if the reference count reaches zero (that is, if it was the last Arc). When you forget() an Arc, the reference count never decrements and Arc simply vanishes.Sigma
@Vannavannatta It might be helpful to realize that ALL Arc referencing the same object are actually simply the same pointer. It's the ownership mechanics where one owner (self) equates to one increment/decrement of the counter that makes everything work. There is no need to "find" the "previous" Arc. If you from_raw(), you need to guarantee that the reference count is correct (actually: one to large). Given that, the "new" Arc is just another copy of the same pointer, whose destructor will decrement the reference count in the future.Sigma
The docs for from_raw align with this response. I was curious if increment_strong_count were different, since its docs do not mention the constraints laid out in from_raw. The implementation does use from_raw_in (essentially from_raw), so increment_strong_count is subject to the same constraints.Hackathorn

© 2022 - 2024 — McMap. All rights reserved.