Is it sound to use raw pointers to an allocated vector to allow multiple threads to write to nonoverlapping parts?
Asked Answered
N

2

5

I have multiple threads doing a computation and want to collect the results into a pre-allocated vector. To turn off the borrow checker, I wrote the function:

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

With that we can fill a vector concurrently (e.g. using Rayon):

let vec = vec![0; 10];
(0..10).into_par_iter().for_each(|i| set_unsync(&vec, i, i));

It compiles, it works, and even Clippy likes it, but is it sound? After reading about things that appear to be sound but actually are Undefined Behavior, I'm unsure. For example, the documentation of the as_ptr method says:

The caller must also ensure that the memory the pointer (non-transitively) points to is never written to (except inside an UnsafeCell) using this pointer or any pointer derived from it.

Strictly speaking, the solution violates this. However, it feels sound to me. If it is not, how can we let multiple threads write to nonoverlapping parts of the same vector without using locks?

Negro answered 10/10, 2022 at 20:33 Comment(1)
Comments are not for extended discussion; this conversation has been moved to chat.Elledge
F
3

Assuming this is your minimal reproducible example:

use rayon::prelude::*;

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

fn main() {

    let input = vec![2, 3, 9];
    let output = vec![0; 100];
    input.par_iter().for_each(|&i| {
        for j in i * 10..(i + 1) * 10 {
            set_unsync(&output, j, i);
        }
    });

    dbg!(output);
}

If you are asking of whether this code works and always will work, then I'd answer with yes.

BUT: it violates many rules on how safe and unsafe code should interact with each other.

If you write a function that is not marked unsafe, you indicate that this method can be abused by users in any way possible without causing undefined behaviour (note that "users" here is not just other people, this also means your own code in safe sections). If you cannot guarantee this, you should mark it unsafe, requiring the caller of the function to mark the invocation as unsafe as well, because the caller then again has to make sure he is using your function correctly. And every point in your code that required a programmer to manually prove that it is free of undefined behaviour must require an unsafe as well. If it's possible to have sections that require a human to prove this, but do not require an unsafe, there is something unsound in your code.

In your case, the set_unsync function is not marked unsafe, but the following code causes undefined behaviour:

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

fn main() {
    let output = vec![0; 5];

    set_unsync(&output, 100000000000000, 42);

    dbg!(output);
}

Note that at no point in your main did you need an unsafe, and yet a segfault is happening here.

Now if you say "but set_unsync is not pub, so no one else can call it. And I, in my par_iter, have ensured that I am using it correctly" - then this is the best indicator that you should mark set_unsync as unsafe. The act of "having to ensure to use the function correctly" is more or less the definition of an unsafe function. unsafe doesn't mean it will break horribly, it just means that the caller has to manually make sure that he is using it correctly, because the compiler can't. It's unsafe from the compiler's point of view.

Here is an example of how your code could be rewritten in a more sound way. I don't claim that it is 100% sound, because I haven't thought about it enough. But I hope this demonstrates how to cleanly interface between safe and unsafe code:

use rayon::prelude::*;

// mark as unsafe, as it's possible to provide parameters that
// cause undefined behaviour
unsafe fn set_unsync(vec: &[usize], idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;

    // No need to use unsafe{} here, as the entire function is already unsafe
    *first_elem.add(idx) = val
}

// Does not need to be marked `unsafe` as no combination of parameters
// could cause undefined behaviour.
// Also note that output is marked `&mut`, which is also crucial.
// Mutating data behind a non-mutable reference is also considered undefined
// behaviour.
fn do_something(input: &[usize], output: &mut [usize]) {
    input.par_iter().for_each(|&i| {
        // This assert is crucial for soundness, otherwise an incorrect value
        // in `input` could cause an out-of-bounds access on `output`
        assert!((i + 1) * 10 <= output.len());

        for j in i * 10..(i + 1) * 10 {
            unsafe {
                // This is the critical point where we interface
                // from safe to unsafe code.
                // This call requires the programmer to manually verify that
                // `set_unsync` never gets called with dangerous parameters.
                set_unsync(&output, j, i);
            }
        }
    });
}

fn main() {
    // note that we now have to declare output `mut`, as it should be
    let input = vec![2, 3, 9];
    let mut output = vec![0; 100];

    do_something(&input, &mut output);

    dbg!(output);
}

EDIT: This code is unsound. Please refer to @ChayimFriedman's answer instead.

Flash answered 10/10, 2022 at 21:38 Comment(9)
A slightly more sound implementation that Miri likes, using SyncUnsafeCell: playground. This never mutates behind an immutable reference or aliases a mutable reference and appears to be compatible with stacked borrows.Stoltz
@Stoltz "Compatible with stacked borrows" - why isn't mine? Not criticizing, just curious :)Flash
Yours triggers a MIRI warning for stacked borrows, I think because you mutate behind a shared reference without an UnsafeCell.Stoltz
Nice, I wasn't aware there is a multithreaded version of UnsafeCell. Unfortunately, it's only available in nightly.Negro
As mentioned in the comment at the top, it's available in stable via crates.io as well.Stoltz
This answer is UB and you should never use it. Please see my answer for details.Jariah
@ChayimFriedman Please don't just state that it is, please elaborate why.Flash
From Behavior considered undefined "Mutating immutable bytes" (such as bytes behind a shared reference) is UB. Also from Vec::as_ptr "The caller must also ensure that the memory the pointer (non-transitively) points to is never written to" makes writing to it UB. Hence every use of your function is UB.Combat
Got it. Thanks :) Does the fact that the programmer ensures that the &Vec comes from a &mut Vec not change anything there? I guess no, because it's part of the as_ptr specification that it shouldn't be done?Flash
J
3

Strictly speaking, the solution violates this. However, it feels sound to me.

"Feels sounds to me" is not a justification for UB. And yes, as it violates as_ptr() conditions, it is UB.

Miri does not flag this, but Miri does not catch all UB. In particular, this is currently a library UB, meaning that Miri won't catch it, and the compiler won't optimize based on it. But I've put an emphasis on "currently" for reason: the standard library is within its right to upgrade this to a language UB at any time. You shouldn't execute UB, not even library UB.

If you will change the &Vec<usize> to &[usize], Miri will also flag it.

A sound approach may look like:

unsafe fn set_unsync(first_elem: *mut usize, idx: usize, val: usize) {
    unsafe { *first_elem.add(idx) = val }
}

#[derive(Clone, Copy)]
pub struct ThreadedRawPtr<T: ?Sized>(pub *mut T);
unsafe impl<T: ?Sized> Send for ThreadedRawPtr<T> {}
unsafe impl<T: ?Sized> Sync for ThreadedRawPtr<T> {}

fn main() {
    let mut vec = vec![0; 10];
    let first_elem = ThreadedRawPtr(vec.as_mut_ptr());
    (0..10).into_par_iter().for_each(move |i| {
        let first_elem = first_elem;
        unsafe { set_unsync(first_elem.0, i, i) }
    });
}

Also note that you may be able to get what you want without unsafe at all.

Jariah answered 30/5, 2024 at 12:57 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.