Converting a Vec<u32> to Vec<u8> in-place and with minimal overhead
Asked Answered
L

5

14

I'm trying to convert a Vec of u32s to a Vec of u8s, preferably in-place and without too much overhead.

My current solution relies on unsafe code to re-construct the Vec. Is there a better way to do this, and what are the risks associated with my solution?

use std::mem;
use std::vec::Vec;

fn main() {
    let mut vec32 = vec![1u32, 2];
    let vec8;
    unsafe {
        let length = vec32.len() * 4; // size of u8 = 4 * size of u32
        let capacity = vec32.capacity() * 4; // ^
        let mutptr = vec32.as_mut_ptr() as *mut u8;
        mem::forget(vec32); // don't run the destructor for vec32

        // construct new vec
        vec8 = Vec::from_raw_parts(mutptr, length, capacity);
    }

    println!("{:?}", vec8)
}

Rust Playground link

Locomotor answered 6/4, 2018 at 10:13 Comment(3)
This is platform dependent, since you don't take the byteorder into account.Existentialism
docs.rs/byteorder/1.2.2/byteorder/… can be used to account for byte order, without copying and is a no-op when the source byte order is the same as the host byte order. Otherwise, I believe your code is safe. Note that the reverse isn't necessarily safe! (Because of alignment.)Wifeless
Related: codereview.stackexchange.com/questions/187013/base64-string-↔-float-array/187077Invariable
B
23
  1. Whenever writing an unsafe block, I strongly encourage people to include a comment on the block explaining why you think the code is actually safe. That type of information is useful for the people who read the code in the future.

  2. Instead of adding comments about the "magic number" 4, just use mem::size_of::<u32>. I'd even go so far as to use size_of for u8 and perform the division for maximum clarity.

  3. You can return the newly-created Vec from the unsafe block.

  4. As mentioned in the comments, "dumping" a block of data like this makes the data format platform dependent; you will get different answers on little endian and big endian systems. This can lead to massive debugging headaches in the future. File formats either encode the platform endianness into the file (making the reader's job harder) or only write a specific endinanness to the file (making the writer's job harder).

  5. I'd probably move the whole unsafe block to a function and give it a name, just for organization purposes.

  6. You don't need to import Vec, it's in the prelude.

use std::mem;

fn main() {
    let mut vec32 = vec![1u32, 2];

    // I copy-pasted this code from StackOverflow without reading the answer 
    // surrounding it that told me to write a comment explaining why this code 
    // is actually safe for my own use case.
    let vec8 = unsafe {
        let ratio = mem::size_of::<u32>() / mem::size_of::<u8>();

        let length = vec32.len() * ratio;
        let capacity = vec32.capacity() * ratio;
        let ptr = vec32.as_mut_ptr() as *mut u8;

        // Don't run the destructor for vec32
        mem::forget(vec32);

        // Construct new Vec
        Vec::from_raw_parts(ptr, length, capacity)
    };

    println!("{:?}", vec8)
}

Playground

My biggest unknown worry about this code lies in the alignment of the memory associated with the Vec.

Rust's underlying allocator allocates and deallocates memory with a specific Layout. Layout contains such information as the size and alignment of the pointer.

I'd assume that this code needs the Layout to match between paired calls to alloc and dealloc. If that's the case, dropping the Vec<u8> constructed from a Vec<u32> might tell the allocator the wrong alignment since that information is based on the element type.

Without better knowledge, the "best" thing to do would be to leave the Vec<u32> as-is and simply get a &[u8] to it. The slice has no interaction with the allocator, avoiding this problem.

Even without interacting with the allocator, you need to be careful about alignment!

See also:

Bieber answered 6/4, 2018 at 13:56 Comment(8)
And I'm not sure that size_of::<u32>() / size_of::<u8>() is much better than 4. You have to know the sizes of those types in order to know there won't be any rounding errors. It would be better to multiply by size_of::<u32>() and then divide by size_of::<u8>(), which you'd have to do if you wrote this generically.Garnet
@PeterHall You also have to know the sizes of the types to know that 4 is the correct result, this just moves comments to code. If constant evaluation was further along, I'd move the let to a const and add a compile_error! that asserts the value is 4. I'd strongly advocate for not writing this generically, so I wasn't as concerned about that case ;-)Bieber
I think you can write this generically, as long as you assume the output is &[u8], in which case you can dispense with the division, and just multiply by the size_of the source type.Garnet
@PeterHall well, depends on how generic you mean and what you expect the behavior to be. For example, if you allow an input Vec<T>, you allow a Vec<&[(String, BTreeMap<char, f32>, PhantomData)]> as well, which I wouldn't think would be a great idea.Bieber
@PeterHall if you want a more principled generic, something like the safe-transmute crate might be more applicable.Bieber
What's wrong with that? You know.. if you wanted to turn a bunch of pointers to bytes... :P But you can obviously protect from that by using marker traits to opt-in useful types. It would look something like: play.rust-lang.org/…Garnet
I didn't know about that crate! I'm sure there didn't used to be so many crates... ;)Garnet
The issue with Layout is not something I'd have suspected. In low-level code I'm used to worrying about casting to a pointer with greater alignment, but casting to a pointer with lower alignment has always been safe. I wonder if it should be specified that allocators should handle this case graciously.Filiate
C
8

If in-place convert is not so mandatory, something like this manages bytes order control and avoids the unsafe block:

extern crate byteorder;

use byteorder::{WriteBytesExt, BigEndian};

fn main() {
    let vec32: Vec<u32> = vec![0xaabbccdd, 2];
    let mut vec8: Vec<u8> = vec![];

    for elem in vec32 {
        vec8.write_u32::<BigEndian>(elem).unwrap();
    }

    println!("{:?}", vec8);
}
Caseinogen answered 6/4, 2018 at 21:15 Comment(1)
Would the compiler's optimization (with -O3) make the whole for-loop a no-op?Escurial
N
1

Simply transmuting the Vec or using from_raw_parts is undefined behavior because the deallocation API requires that the passed Layout is the same as the allocation was allocated with. To do this kind of conversion soundly you need to go through the Vec's associated allocator and call shrink to convert the layout to the new alignment before calling from_raw_parts. This depends on the allocator being able to perform in-place reallocation.

If you don't need the resulting vector to be resizable then reinterpreting a &mut [u32] borrow of the vec to &mut [u8] would be a simpler option.

Natatorial answered 3/1, 2023 at 11:1 Comment(0)
K
0

This is how I solved the problem using a bitshifting copy.

It works on my x64 machine, but I am unsure if I have made unsafe assumptions about little/big endianism.

Runtime performance would be faster if this cast could be done memory inplace without the need for a copy, but I have not figured out how to do this yet.

/// Cast Vec<u32> to Vec<u8> without modifying underlying byte data
/// ```
/// # use fractals::services::vectors::vec_u32_to_u8;
/// assert_eq!( vec_u32_to_u8(&vec![ 0x12345678 ]), vec![ 0x12u8, 0x34u8, 0x56u8, 0x78u8 ]);
/// ```
#[allow(clippy::identity_op)]
pub fn vec_u32_to_u8(data: &Vec<u32>) -> Vec<u8> {
    // TODO: https://mcmap.net/q/589674/-how-to-convert-a-u32-array-to-a-u8-array-in-place
    // TODO: https://mcmap.net/q/589673/-how-to-slice-a-large-vec-lt-i32-gt-as-amp-u8
    let capacity = 32/8 * data.len() as usize;  // 32/8 == 4
    let mut output = Vec::<u8>::with_capacity(capacity);
    for &value in data {
        output.push((value >> 24) as u8);  // r
        output.push((value >> 16) as u8);  // g
        output.push((value >>  8) as u8);  // b
        output.push((value >>  0) as u8);  // a
    }
    output
}
Krahmer answered 19/7, 2022 at 17:54 Comment(0)
L
-1

Try this

let vec32: Vec<u32> = vec![1u32, 2u32];
let mut vec8: Vec<u8> = vec![];
for v in &vec32{
    for b in v.to_be_bytes(){
        vec8.push(b);
    }
}
println!("{:?}", vec32);
println!("{:?}", vec8);

Playgroud

Lanam answered 5/3, 2023 at 0:14 Comment(1)
Thanks, but this is not in-place.Locomotor

© 2022 - 2024 — McMap. All rights reserved.