How can old school multi-threading (no wrapping mutex) can be achieve in Rust? And why is it undefined behavior?
I have to build a highly concurrent physic simulation. I am supposed to do it in C, but I chose Rust (I really needed higher level features).
By using Rust, I should opt for a safe communication between threads, however, I must use a mutable buffer shared between the threads. (actually, I have to implement different techniques and benchmark them)
First approach
Use
Arc<Data>
to share non-mutable state.Use
transmute
to promote&
to&mut
when needed.
It was straightforward but the compiler would prevent this from compiling even with unsafe
block. It is because the compiler can apply optimizations knowing this data is supposedly non-mutable (maybe cached and never updated, not an expert about that).
This kind of optimizations can be stopped by Cell
wrapper and others.
Second approach
Use
Arc<UnsafeCell<Data>>
.Then
data.get()
to access data.
This does not compile either. The reason is that UnsafeCell
is not Send
. The solution is to use SyncUnsafeCell
but it is unstable for the moment (1.66), and the program will be compile and put to production on a machine with only the stable version.
Third approach
Use
Arc<Mutex<Data>>
.At the beginning of each threads:
Lock the mutex.
Keep a
*mut
by coercing a&mut
.Release the mutex.
Use the
*mut
when needed
I haven't tried this one yet, but even if it compiles, is it safe (not talking about data race) as it would be with SyncUnsafeCell
?
PS: The values concurrently mutated are just f32
, there are absolutely no memory allocation or any complex operations happening concurrently. Worst case scenario, I have scrambled some f32
.
SyncUnsafeCell
fromstd
, there doesn't seem to be anything magic about it. Btw, how about aVec<AtomicU32>
(because there's noAtomicF32
), and r/w to it throughf32::from
/to_bits
? – PairoarAtomicU32
is zero-overhead and gives you exactly what you need. – ObtundAtomicU32
really have zero-overhead when fetching and writing back ? That could be the best approach indeed. – MappAtomicU32
zero-overhead is a bit misleading, I think: If it does incur synchronization overhead between cores through the cache consistency mechanisms, it can be quite heavy. For example, several concurrent threads incrementing the sameAtomicU32
will only get a few 1000 increments done per second, iirc. – Pairoarf32
. Any atomic op has to be... well, atomic. – Mappf32
, though. For example, if you access the sameuint32_t
in C++ from multiple threads, you also get the cache consistency slowdown. It's more of an architectural thing than a code overhead. If you look at what it compiles to, there is zero difference between using an atomic and a normalf32
. – ObtundAtomicU32
actually mean compared tou32
, ifu32
is already atomic? Two things: volaticity (meaning: cannot be optimized into registers) and prevention of instruction reordering (hence theOrdering
parameters). Apart of that,load
andstore
is identical betweenAtomicU32
andu32
. – Obtund