Do aliasing mutable raw pointers (*mut T) cause undefined behaviour?
Asked Answered
P

1

17

&mut T and &mut T results in a compilation error; this is great, it's objectively wrong to borrow mutably twice.

Is *mut T and*mut T undefined behaviour or is this a perfectly valid thing to do? That is, is mutable pointer aliasing valid?

What makes it even worse is that &mut T and *mut T actually compiles and works as intended, I can modify a value through the reference, the pointer, and then the reference again... but I've seen someone say that it's undefined behaviour. Yeah, "someone said so" is the only information I have.

Here's what I tested:

fn main() {
    let mut value: u8 = 42;

    let r: &mut u8 = &mut value;
    let p: *mut u8 = r as *mut _;

    *r += 1;

    unsafe { *p += 1; }

    *r -= 1;

    unsafe { *p -= 1; }

    println!("{}", value);
}

and of course, the main point of question:

Note — Thanks to trentcl for pointing out this example actually causes a copy when creating p2. This can be confirmed by replacing u8 with a non-Copy type. The compiler then complains about a move. Sadly, this does not get me closer to the answer, only reminds me that I can get unintended behaviour without it being undefined behaviour, simply because of Rust's move semantics.

fn main() {
    let mut value: u8 = 42;

    let p1: *mut u8 = &mut value as *mut _;
    // this part was edited, left in so it's easy to spot
    // it's not important how I got this value, what's important is that it points to same variable and allows mutating it
    // I did it this way, hoping that trying to access real value then grab new pointer again, would break something, if it was UB to do this
    //let p2: *mut u8 = &mut unsafe { *p1 } as *mut _;
    let p2: *mut u8 = p1;

    unsafe {
        *p1 += 1;
        *p2 += 1;
        *p1 -= 1;
        *p2 -= 1;
    }

    println!("{}", value);
}

Both yield:

42

Does this imply that two mutable pointers pointing to the same location and being dereferenced at different times is not undefined behaviour?

I don't think testing this on compiler is a good idea to begin with, as undefined behaviour could have anything happen, even printing 42 as if nothing is wrong. I mention it anyway as this is one of things I tried, hoping to get an objective answer.

I have no clue how to write a test that could force erratic behaviour that would make it dead obvious that this doesn't work because it's not used as intended, if that's even possible to do so.

I'm aware that this is very likely to be undefined behaviour and break in a multithreaded environment no matter what. I would expect a more detailed answer than that, though, especially if mutable pointer aliasing IS NOT undefined behaviour. (This would in fact be awesome, because while I use Rust for reasons like everyone else - memory safety, to say the least... I expect to still retain a shotgun that I could point anywhere, without it being locked onto my feet. I can have aliased "mutable pointers" without blowing my feet off in C.)

This is a question about whether I can, not about whether I should. I want to dive head-on into unsafe Rust, just to learn about it, but it feels like there's not enough information unlike in "horrible" languages like C about what's undefined behaviour and what's not.

Preventer answered 5/8, 2019 at 19:1 Comment(13)
You can create aliased mutable pointers without any unsafe, so just creating them must be safe, by definition. Using them is another matter, of course...Allhallowmas
Your second example doesn't do what you think: p1 and p2 don't alias. proofDesensitize
@trentcl Does that imply that it's indeed UB? Because I just tried the obvious way, which is p2 = p1 like you would do naturally... And then they point to the same memory location, like expected. I don't really understand that low level kind of stuff, but this is completely wrong. It copies the value and then...?Preventer
... It actually copies value when dereferencing, I changed it, and it complains about moving out. Is this a gotcha of unsafe Rust and is intended, or part of UB? Assigning a pointer directly (which does not move/copy value), shows same address. I also almost understand why it behaves like that, but also it is what I fear the most, if this is UB, then p2 = p1 will be too, and playing with raw pointers is impossible.Preventer
It's not UB. It's temporary lvalue-promotion -- see Why is it legal to borrow a temporary? unsafe { *p1 } returns a copied u8 and &mut takes a reference to the copy. You could write the same thing with no unsafe or raw pointers at all, although it's obviously not what you intended.Desensitize
I see, that was just a bad example by me then. I wrote that part without thinking about Rust's move semantics that do not disappear just because code is unsafe. And the safe version makes that mistake perfectly clear, I indeed would expect it to be copied in there. I'll edit my question with this. Sadly, the question remains the same for values that have really aliased raw pointers.Preventer
The first example is still UB because the compiler needs to take a &mut reference to *p in order to perform += on it. And yeah, you can't "just" move a (non-Copy) type out of a *mut pointer, because doing so is even more unsafe than just dereferencing the thing -- you need to use ptr::read for that.Desensitize
Using multiple mutable raw pointers to change the same memory location is not UB in Rust, even if these pointers have different types, which would be UB in C++. In a way, you get even more power in Rust than in C++. However, you need to be very careful to never create multiple mutable references at the same time. (I don't have time to expand this to a proper answer and find references, so leaving this as a comment.)Selfwill
@SvenMarnach when you do have time though, could you expand on it, possibly in form of the answer, so whoever finds this can have a clear answer? And yes, multiple references or references alongside pointers, aren't something that should ever happen, that part is clear now. That's why my question is split into 2 parts, one for &mut and *mut, and one for *mut and *mut. Also, strict aliasing wasn't on my mind at all, but if what you say is true, although it's not something I'd be willing to use, especially after learning not to break it in C++, I'm happy to learn that it's an option.Preventer
@trentcl The surprising bit in the first version of the second example is that unsafe { &mut *p1 } is different from &mut unsafe { *p1 }. The unsafe block turns the place expression into a value expression, thereby triggering a move.Selfwill
See also How to use (unsafe) aliasing?; What are the differences between *const T and *mut T` raw pointers?.Orchestral
@SvenManarch I found why it behaves like that. Dereference of a pointer must be contained inside unsafe {}, otherwise the value is attempted to be "moved out" into safe code for safety reasons. This in turn puts all the work on you, which is unsurprisingly the point of using unsafe. sourcePreventer
I laughed at the "unintended behavior" :)Accolade
D
14

Author's note: The following is an intuitive explanation, not a rigorous one. I don't believe there is a rigorous definition of "aliasing" in Rust right now, but you may find it helpful to read the Rustonomicon chapters on references and aliasing.

The rules of references (&T and &mut T) are simple:

  • At any given time, you can have either one mutable reference or any number of immutable references.
  • References must always be valid.

There are no "rules of raw pointers". Raw pointers (*const T and *mut T) can alias anything, anywhere, or they can point to nothing at all.

Undefined behavior can happen when you dereference a raw pointer, implicitly or explicitly turning it into a reference. This reference still must obey the rules of references, even when the & isn't explicit in the source code.

In your first example,

unsafe { *p += 1; }

*p += 1; takes a &mut reference to *p in order to use the += operator, as if you had written

unsafe { AddAssign::add_assign(&mut *p, 1); }

(The compiler does not actually use AddAssign to implement += for u8, but the semantics are the same.)

Because &mut *p is aliased by another reference, namely r, the first rule of references is violated, causing undefined behavior.

Your second example (since editing) is different because there is no reference to alias, only another pointer, and there are no aliasing rules that govern pointers. Therefore, this

let p1: *mut u8 = &mut value;
let p2: *mut u8 = p1;

unsafe {
    *p1 += 1;
    *p2 += 1;
    *p1 -= 1;
    *p2 -= 1;
}

in the absence of any other references to value, is perfectly sound.

Desensitize answered 5/8, 2019 at 20:58 Comment(8)
Thank you. In the last part you mention that in multithreaded case this does not hold (due to data races), I think it's worth mentioning that it, as far as I understand, is cimcurvented by using AtomicPtr<T>, as that structure, I'm quoting, has the same in-memory representation as a *mut T, and is worth a mention in the answer.Preventer
@Sahsahae AtomicPtr<T> actually solves a different problem: how to synchronize access to the pointer itself (it doesn't synchronize the T). But perhaps I conveyed the wrong impression: it's still fine to use *mut T in multithreaded contexts, as long as the rules of references are preserved. Synchronizing access requires either atomicity (might as well use AtomicU8), or runtime checks (might as well use Mutex<u8> or RwLock<u8>). Both atomics and Mutex/RwLock are implemented using raw pointers.Desensitize
As much as I'd like to upvote, I am afraid that this is a tad simplistic. The rule "At any given time, you can have either one mutable reference or any number of immutable references." does not rule out having multiple mutable references in scope. After all, when borrowing, the original mutable reference is still in scope, you're just not allowed to use it for the time of the borrow.Smithsonite
@Matthieu I agree. I'm not sure I can make it less simplistic, though, without trying to define in exact terms what's meant by "aliasing" and "liveness", which are pretty poorly defined (at least in official sources). The rule you quote is straight from TRPL.Desensitize
@trentcl: I agree, I think we're hitting the limits of Ralf's work here. I don't necessarily see *r += 1; unsafe { *p += 1; } *r += 1; as problematic, as it could be argued that *p borrowed r for the duration of the statement, then released the borrow before the second increment of r. In practice, this seems impractical as the analysis required would be too complicated, but from a temporal point of view it's arguably possible. I think it'd be rejected by Ralf's Stacked Borrows, due the lack of causality, but...Smithsonite
@MatthieuM.: I don't see why there should be any practical difficulty with having a construct which says "a compiler should behave as though the actions of entering or leaving this block might arbitrarily affect any object whose address has ever escaped". Any loop containing such a construct or calls to a function that contains such a construct would likely perform badly, but if a function performs three loops in sequence and one of them calls a function containing such a construct, that shouldn't affect the performance of the other two. If programmers only use such a construct when...Summerwood
...there would be no other practical way to do what needs to be done, then the only programs where the performance costs of the construct would be relevant would be those which wouldn't be practical at all in its absence.Summerwood
@MatthieuM.: Among other things, when doing systems programming, it may be necessary to do things like change memory mappings, etc. and ensure that operations which were relevant under one memory mapping don't get reordered across operations that change mappings. Actually, an even stronger directive that applies to all static objects could be helpful in systems that allow linker control of static object placement.Summerwood

© 2022 - 2024 — McMap. All rights reserved.