Does std::ptr::write transfer the "uninitialized-ness" of the bytes it writes?

use std::mem::{size_of, align_of}; struct PaddingDemo { data: u8, force_pad: [usize; 0] } assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>()); assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());

use std::ptr; use std::mem::MaybeUninit; let data = PaddingDemo { data: 12, force_pad: [] }; // In order to ensure all the bytes are initialized, // zero-initialize the buffer let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed(); let ptr = packed.as_mut_ptr() as *mut PaddingDemo; let packed_int = unsafe { std::ptr::write(ptr, data); packed.assume_init() }; // Attempt to trigger UB in Miri by reading the // possibly uninitialized bytes let copied = unsafe { ptr::read(&packed_int) };

Does that assume_init call triggered undefined behavior?

Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.

But the details are a bit more complicated. There are two ways to copy data around in memory in Rust. Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.

Untyped / byte-wise copy

In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.

This is what memcpy/memmove in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping} probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.

Typed copy

The alternative is a "typed copy", which is what happens on every normal assignment (=) and when passing values to/from a function. A typed copy interprets the source memory at some type T, and then "re-serializes" that value of type T into the target memory.

The key difference to a byte-wise copy is that information which is not relevant at the type T is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.

So even when you transmute 0usize to PaddingDemo, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data sits at offset 0, which is not guaranteed (add #[repr(C)] if you want that guarantee).

In your case, ptr::write takes an argument of type PaddingDemo, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.

Uninitialized `usize`

Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.

Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /. (Some of this also depends on LLVM decisions around poison and freeze; LLVM might decide that when doing a load at integer type, the result is fully poison if any input byte is poison.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.

If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.

Untyped / byte-wise copy

Typed copy

Uninitialized `usize`

Recommended topics

Hot tags

Untyped / byte-wise copy

Typed copy

Uninitialized usize

Recommended topics

Hot tags

Uninitialized `usize`