Does std::ptr::write transfer the "uninitialized-ness" of the bytes it writes?
Asked Answered
B

1

9

I'm working on a library that help transact types that fit in a pointer-size int over FFI boundaries. Suppose I have a struct like this:

use std::mem::{size_of, align_of};

struct PaddingDemo {
    data: u8,
    force_pad: [usize; 0]
}

assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>());
assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());

This struct has 1 data byte and 7 padding bytes. I want to pack an instance of this struct into a usize and then unpack it on the other side of an FFI boundary. Because this library is generic, I'm using MaybeUninit and ptr::write:

use std::ptr;
use std::mem::MaybeUninit;

let data = PaddingDemo { data: 12, force_pad: [] };

// In order to ensure all the bytes are initialized,
// zero-initialize the buffer
let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed();
let ptr = packed.as_mut_ptr() as *mut PaddingDemo;

let packed_int = unsafe {
    std::ptr::write(ptr, data);
    packed.assume_init()
};

// Attempt to trigger UB in Miri by reading the
// possibly uninitialized bytes
let copied = unsafe { ptr::read(&packed_int) };

Does that assume_init call triggered undefined behavior? In other words, when the ptr::write copies the struct into the buffer, does it copy the uninitialized-ness of the padding bytes, overwriting the initialized state as zero bytes?

Currently, when this or similar code is run in Miri, it doesn't detect any Undefined Behavior. However, per the discussion about this issue on github, ptr::write is supposedly allowed to copy those padding bytes, and furthermore to copy their uninitialized-ness. Is that true? The docs for ptr::write don't talk about this at all, nor does the nomicon section on uninitialized memory.

Bespangle answered 9/4, 2020 at 4:31 Comment(1)
Some useful optimizations may be facilitated by having a copy of an indeterminate value leave the destination in an indeterminate state, but there are other times when it's necessary to be able to copy an object with the semantics that any parts of the original that were indeterminate become unspecified in the copy (so any future copies would be guaranteed to match each other). Unfortunately, language designers don't seem to give much consideration to the importance of the being able to achieve the latter semantics in security-sensitive code.Alodee
S
5

Does that assume_init call triggered undefined behavior?

Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.

But the details are a bit more complicated. There are two ways to copy data around in memory in Rust. Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.

Untyped / byte-wise copy

In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.

This is what memcpy/memmove in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping} probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.

Typed copy

The alternative is a "typed copy", which is what happens on every normal assignment (=) and when passing values to/from a function. A typed copy interprets the source memory at some type T, and then "re-serializes" that value of type T into the target memory.

The key difference to a byte-wise copy is that information which is not relevant at the type T is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.

So even when you transmute 0usize to PaddingDemo, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data sits at offset 0, which is not guaranteed (add #[repr(C)] if you want that guarantee).

In your case, ptr::write takes an argument of type PaddingDemo, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.

Uninitialized usize

Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.

Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /. (Some of this also depends on LLVM decisions around poison and freeze; LLVM might decide that when doing a load at integer type, the result is fully poison if any input byte is poison.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.

If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.

Shellashellac answered 11/4, 2020 at 18:4 Comment(9)
So, hypothetically, if the behavior described in your last paragraph gets formalized (not the case right now), a usize could be allowed to have UU bytes so long as no operations are performed on it, and then transmuted back into my original type, which would work because it doesn't matter if the padding bytes are UU.Bespangle
Thanks for the detailed answer! Would it be possible for Miri to detect this kind of undefined behaviour?Onward
@Bespangle if we decided that usize represents bags of bytes (and not integers), then yes, usize and MaybeUninit<usize> would be equivalent and both would perfectly preserve the underlying bytes-level (and this includes "undefined bytes") representation.Shellashellac
@SvenMarnach Miri already implements "a single undefined byte makes the entire integer undefined", at least when you are working directly with integers. In fact it does that too often. For larger types containing integer fields, Miri always performs byte-level copies.Shellashellac
@SvenMarnach If you are asking about Miri detecting (partially) uninitialized usize and making them UB, that would be easy to implement, but I am worried many would rather wait for the situation to settle than changing their code. I could imagine having a flag for this to opt-in to the stricter checking, do you think that would be useful?Shellashellac
The OP stated that Miri did not detect any undefined behaviour in the code in the question, while you called it UB rather unequivocally – this was the only reason I asked. So put another way, why isn't Miri complaining about the code in the question?Onward
@SvenMarnach Because the current implementation of ptr::write is smart enough to not copy the tailing uninitialized bytes.Bespangle
@Bespangle no that is not correct (or you forgot a "not"). @SvenMarnach Miri currently does not lose padding on typed copies. That is non-trivial to implement. As a consequence, it incorrectly considers packed_int fully initialized. Moreover, even if packed_int would be partially/completely uninitialized like it should, Miri would not consider the uninit integer UB until you actually do arithmetic on it or print it.Shellashellac
@RalfJung That's actually the behavior I want, since I never do arithmetic or print or branch or "use" in any way the returned int. All that happens to it is that it gets copied around and eventually copied transformed back into a PaddingDemo. But it sounds like Miri's current behavior is not correct.Bespangle

© 2022 - 2024 — McMap. All rights reserved.