Splitting a `Vec`
Asked Answered
G

1

7

I'm trying to write a little buffer-thing for parsing so I can pull records off the front of as I parse them out, ideally without making any copies and just transferring ownership of chunks of the front of the buffer off as I run. Here's my implementation:

struct BufferThing {
    buf: Vec<u8>,
}

impl BufferThing {
    fn extract(&mut self, size: usize) -> Vec<u8> {
        assert!(size <= self.buf.len());
        let remaining: usize = self.buf.len() - size;
        let ptr: *mut u8 = self.buf.as_mut_ptr();

        unsafe {
            self.buf = Vec::from_raw_parts(ptr.offset(size as isize), remaining, remaining);
            Vec::from_raw_parts(ptr, size, size)
        }
    }
}

This compiles, but panics with a signal: 11, SIGSEGV: invalid memory reference as it starts running. This is mostly the same code as the example in the Nomicon, but I'm trying to do it on Vec's and I'm trying to split a field instead of the object itself.

Is it possible to do this without copying out one of the Vecs? And is there some section of the Nomicon or other documentation that explains why I'm blowing everything up in the unsafe block?

Granese answered 10/2, 2017 at 16:18 Comment(4)
It is a very bad idea to use unsafe code if you cannot explain why it is indeed safe.Soissons
You could maybe achieve what you want using splices. split vector into two slices for example elements [0,1,2] and [3...] and you then process 0,1,2. and then in next iteration take [0.1.2] from that 2nd splice. Also VecDeque comes to mind.Knawel
Does your parser really consume a Vec? I would imagine your parser would be happy with a slice.Gomer
The buffer's being read from a very large file so I can't fit it all in memory at once (and the buffer changes as I read the file so I have lifetime issues with slices). Copying is an okay solution, but I was looking to see if there was something a bit more clever I could do to prevent that.Granese
H
5

Unfortunately, that's not how memory allocators work. It might have been possible in the past, when memory was at a premium, but today's allocators are geared for speed rather than memory preservation.

A common implementation of memory allocators is to use slabs. Basically, it's:

struct Allocator {
    less_than_32_bytes: List<[u8; 32]>,
    less_than_64_bytes: List<[u8; 64]>,
    less_than_128_bytes: List<[u8; 128]>,
    less_than_256_bytes: List<[u8; 256]>,
    less_than_512_bytes: List<[u8; 512]>,
    ...
}

When you request 96 bytes, it takes an element from less_than_128_bytes.

When you free that element, it frees all of it, not just the first N bytes, and the whole block is now re-usable. Any pointer inside the block is now dangling and should NOT be dereferenced.

Furthermore, trying to free a pointer in the middle of a block will only confuse the allocator: it won't find it, because the contract is that you address blocks by their first byte.

You violated the contract using unsafe code, BOOM.


The solution I propose is simple:

  • use a single Vec<u8> containing the whole buffer to parse
  • use slices into this Vec for parsing

Rust will check the lifetimes, so your slices cannot outlive the buffer, and slicing a slice further (s[..offset], s[offset..]) does not allocate.


If you don't mind one allocation, there's Vec::split_off which allocates a new Vec big enough for the split part.

Hagood answered 10/2, 2017 at 16:27 Comment(3)
@Shepmaster: it allocates, I don't like allocations :)Hagood
@Shepmaster: Well, the OP asked without allocations ;) We can either edit the title to reflect it or I can edit my answer to include Vec::split_off as an allocating alternative. I'm not sure what's the best, what do you think?Hagood
@Shepmaster: I was looking at Is it possible to do this without copying out one of the Vecs?; I can modify my answer for now, seems easier.Hagood

© 2022 - 2024 — McMap. All rights reserved.