Does that assume_init call triggered undefined behavior?
Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.
But the details are a bit more complicated.
There are two ways to copy data around in memory in Rust.
Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.
Untyped / byte-wise copy
In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.
This is what memcpy
/memmove
in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping}
probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.
Typed copy
The alternative is a "typed copy", which is what happens on every normal assignment (=
) and when passing values to/from a function. A typed copy interprets the source memory at some type T
, and then "re-serializes" that value of type T
into the target memory.
The key difference to a byte-wise copy is that information which is not relevant at the type T
is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.
So even when you transmute 0usize
to PaddingDemo
, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data
sits at offset 0, which is not guaranteed (add #[repr(C)]
if you want that guarantee).
In your case, ptr::write
takes an argument of type PaddingDemo
, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.
Uninitialized usize
Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize
is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.
Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /
. (Some of this also depends on LLVM decisions around poison
and freeze
; LLVM might decide that when doing a load at integer type, the result is fully poison
if any input byte is poison
.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.
If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit
. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.