Writing to a field in a MaybeUninit structure?
Asked Answered
A

2

5

I'm doing something with MaybeUninit and FFI in Rust that seems to work, but I suspect may be unsound/relying on undefined behavior.

My aim is to have a struct MoreA extend a struct A, by including A as an initial field. And then to call some C code that writes to the struct A. And then finalize MoreA by filling in its additional fields, based on what's in A.

In my application, the additional fields of MoreA are all integers, so I don't have to worry about assignments to them dropping the (uninitialized) previous values.

Here's a minimal example:

use core::fmt::Debug;
use std::mem::MaybeUninit;

#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct A(i32, i32);

#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct MoreA {
    head: A,
    more: i32,
}

unsafe fn mock_ffi(p: *mut A) {
    // write doesn't drop previous (uninitialized) occupant of p
    p.write(A(1, 2));
}

fn main() {
    let mut b = MaybeUninit::<MoreA>::uninit();
    unsafe { mock_ffi(b.as_mut_ptr().cast()); }
    let b = unsafe {
        let mut b = b.assume_init();
        b.more = 3;
        b
    };
    assert_eq!(&b, &MoreA { head: A(1, 2), more: 3 });
}

Is the code let b = unsafe { ... } sound? It runs Ok and Miri doesn't complain.

But the MaybeUninit docs say:

Moreover, uninitialized memory is special in that the compiler knows that it does not have a fixed value. This makes it undefined behavior to have uninitialized data in a variable even if that variable has an integer type, which otherwise can hold any fixed bit pattern.

Also, the Rust book says that Behavior considered undefined includes:

  • Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):

    ... An integer (i*/u*) or ... obtained from uninitialized memory.

On the other hand, it doesn't seem possible to write to the more field before calling assume_init. Later on the same page:

There is currently no supported way to create a raw pointer or reference to a field of a struct inside MaybeUninit. That means it is not possible to create a struct by calling MaybeUninit::uninit::() and then writing to its fields.

If what I'm doing in the above code example does trigger undefined behavior, what would solutions be?

  1. I'd like to avoid boxing the A value (that is, I'd like to have it be directly included in MoreA).

  2. I'd hope also to avoid having to create one A to pass to mock_ffi and then having to copy the results into MoreA. A in my real application is a large structure.

I guess if there's no sound way to get what I'm after, though, I'd have to choose one of those two fallbacks.

If struct A is of a type that can hold the bit-pattern 0 as a valid value, then I guess a third fallback would be:

  1. Start with MaybeUninit::zeroed() rather than MaybeUninit::uninit().
Airspeed answered 20/4, 2020 at 8:57 Comment(6)
Here's one possibility for the minimal example: replace the current let b = unsafe { ... } code with let b = unsafe { (*b.as_mut_ptr()).more = 3; b.assume_init() }. The docs on MaybeUninit::as_mut_ptr only say that reading from the pointer or turning it into a ref are undefined, before the struct is initialized. That leaves open that writing to it is ok. (But in my application, that won't work because I need to set the more field based on what's in the other fields, so I'd need to read and write.)Airspeed
Well, I could start by setting more to 0, then call assume_init, then mutate the more field as needed. So far as I can understand the docs, this should be defined behavior (if my ffi call does indeed initialize the A part of the structure).Airspeed
You can initialize a struct field by field, as long as each field is also MaybeUninit. Given that your struct is repr(C), you can use an intermediate struct for the purpose and then transmute it. Playground example.Overgrowth
Ok, thanks. I understand your example. I'd like to know know though why it's not OK to do unsafe { (*b.as_mut_ptr()).more = 3; b.asume_init() }, where the more field has an int type (so no issues about its being dropped).Airspeed
In practice, I think it would be ok, but only because the field is an i32. It would be UB if it was a bool or most other types.Overgrowth
@Airspeed ekrem-dinçel's answer seems most valid, in 2024. To avoid people getting old info, would you consider marking it as an accepted answer, perhaps?Marquettamarquette
H
6

It is now possible (since Rust 1.51) to initialize fields of any uninitialized struct using the std::ptr::addr_of_mut macro. This example is from the documentation:

You can use MaybeUninit, and the std::ptr::addr_of_mut macro, to initialize structs field by field:


#[derive(Debug, PartialEq)] pub struct Foo {
    name: String,
    list: Vec<u8>, }

let foo = {
    let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
    let ptr = uninit.as_mut_ptr();

    // Initializing the `name` field
    unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }

    // Initializing the `list` field
    // If there is a panic here, then the `String` in the `name` field leaks.
    unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }

    // All the fields are initialized, so we call `assume_init` to get an initialized Foo.
    unsafe { uninit.assume_init() } };

assert_eq!(
    foo,
    Foo {
        name: "Bob".to_string(),
        list: vec![0, 1, 2]
    } 
); 
Heterochromatic answered 3/3, 2021 at 21:13 Comment(0)
O
5

Currently, the only sound way to refer to uninitialized memory—of any type—is MaybeUninit. In practice, it is probably safe to read or write to uninitialized integers, but that is not officially documented. It is definitely not safe to read or write to an uninitialized bool or most other types.

In general, as the documentation states, you cannot initialize a struct field by field. However, it is sound to do so as long as:

  1. the struct has repr(C). This is necessary because it prevents Rust from doing clever layout tricks, so that the layout of a field of type MaybeUninit<T> remains identical to the layout of a field of type T, regardless of its adjacent fields.
  2. every field is MaybeUninit. This lets us assume_init() for the entire struct, and then later initialise each field individually.

Given that your struct is already repr(C), you can use an intermediate representation which uses MaybeIninit for every field. The repr(C) also means that we can transmute between the types once it is initialised, provided that the two structs have the same fields in the same order.

use std::mem::{self, MaybeUninit};

#[repr(C)]
struct MoreAConstruct {
    head: MaybeUninit<A>,
    more: MaybeUninit<i32>,
}

let b: MoreA = unsafe {
    // It's OK to assume a struct is initialized when all of its fields are MaybeUninit
    let mut b_construct = MaybeUninit::<MoreAConstruct>::uninit().assume_init();
    mock_ffi(b_construct.head.as_mut_ptr());
    b_construct.more = MaybeUninit::new(3);
    mem::transmute(b_construct)
};
Overgrowth answered 20/4, 2020 at 11:17 Comment(5)
Would it not work, going back to the OP's solution of a single MaybeUninit<MoreA> to simply write to the more field before calling assume_init()? Then by the time assume_init is called, all fields have been initialized.Jersey
@MatthieuM. I'm not convinced that can be done soundly. How do you safely access a field of a MaybeUninit?Overgrowth
By safely, do you mean outside an unsafe block, or soundly?Jersey
@MatthieuM. SoundlyOvergrowth
@PeterHall, thanks for your help with this. I just want to add for later reference a few relevant links: from the nomicon, rfc 2582 on syntax to directly create *mut <T>s, and rust unsafe code guidelines on uninitialized ints.Airspeed

© 2022 - 2024 — McMap. All rights reserved.