Assign std::vector<std::byte> to std::vector<char> WITHOUT copying memory
Asked Answered
A

3

6

I have a function that returns a std::vector<std::byte>

I am aware that std::byte is not a character type nor an integral type, and that converting it to char is only possible through a typecast. So far so good.

So I would like (in cases where I know that the vector only contains character data) to transfer ownership of the underlying buffer from the std::vector<std::byte> to a std::vector<char> using std::move, so as to avoid copying the entire underlying buffer.

When I try doing this, I get this error:

no suitable user-defined conversion from "std::vector<std::byte, std::allocatorstd::byte>" to "std::vector<char,std::allocator>" exists

Is this at all possible using C++? I think there are real use cases where one would want to do this

Ammonal answered 21/6, 2020 at 22:14 Comment(15)
can you please post a minimal reproducible example?Truly
I think you can't because when you've defined them you've made them std::bytes, hence you can't move them as new types. moving means transferring the ownership i.e the elements themselves will be moved, so If you can change the type of some object (without copying), you can do what you want.Utas
@Berto99 Sure; Let's say we have std::vector<std::byte> bytevec. This vector only contains character data (i.e. data which can be represented as char). I would like to be able to do something like std::vector<char> charvec = std::move(bytevec) so that the underlying buffer with the data is transferred from bytevec to charvec without actually copying the data. Of course the compiler complains because std::byte cannot be implicitly converted to char, so I was wondering if there is a way to 'cast' it while using std::move?Ammonal
so you would not like to use a for loop to move the bytes, instead you want to swap just the pointers?Truly
I don't think this is possible. Casting may work in practice (or it may not) but it will be undefined technically. I think it may be worth investing in developing a std::vector like type that has the ability to adopt external memory. Or consider std::span.Pironi
@Berto99 Yes I'd like the char vector to take ownership of the data from the byte vector. After copying the pointers from the origin object to the target object, the pointers of the origin object would be set to something like null to indicate that it no longer 'owns' that memory (more info about std::move here: en.cppreference.com/w/cpp/utility/move)Ammonal
@Pironi It would make sense to have a std::vector class that can adopt external memory. Do you know why the STL vector class does not have this feature yet?Ammonal
@Ammonal I doubt the STL will ever have such features as it is a very general purpose library. If you want to do low-level (advanced/dangerous) stuff you're on your own (or find a library that supports that).Pironi
@Pironi Got it. Thanks :)Ammonal
@Pironi plz, can you tell me if my previous comment is true or false and explain it?Utas
@Ammonal Would it be ok to create a class that keeps a refenrece to the vector<byte> and provides a char (or uint8_t) interface that deals with the casting back and forth? I think that's the route I'd take.Middleton
@matt: "It would make sense to have a std::vector class that can adopt external memory." But vector<T> doesn't store memory; it stores an array of Ts. It is reasonable to allow a vector<T> instance to adopt the storage from another vector<T> instance. It makes far less sense for it to be able to adopt the "memory" of some unrelated type vector<U>, since that is an array of Us, which is not an array of Ts.Heroic
@NicolBolas: That's true in cases where an array of Us is not an array of Ts. But the strict aliasing rule makes that a bit more complicated than simply is_same_type_v<remove_const<T>, remove_const<U>>, because e.g. signed and unsigned variations of the same type are guaranteed to be representation-compatible and valid for aliasing, so any time you have an unsigned[N] you do in fact also have an int[N] and vice versa. Here the question involves compatibility between std::byte and char.Subantarctic
@BenVoigt: Just because you can alias doesn’t mean you can do pointer arithmetic—but morally you perhaps should be able to (to access the underlying storage); there are proposals in flight for the latter (at least for std::byte or so).Andaman
@DavisHerring: According to the pointer arithmetic rules, it is allowed as long as you don't leave the bounds of the parent object. Which in this case is the array.Subantarctic
L
-6

You can achieve this with a cast, as shown below. This is legal because the cast is to a char reference (if casting to any other type it would be UB) but, with gcc at least, you still have to compile it with -fno-strict-aliasing to silence the compiler warning. Anyway, here's the cast:

std::vector <char> char_vector = reinterpret_cast <std::vector <char> &&> (byte_vector);

And here's a live demo

Liquesce answered 21/6, 2020 at 23:2 Comment(2)
flak flak flak.Pironi
Since you're bringing the big guns, why do you even need std::move anymore. Just auto char_vector = reinterpret_cast<std::vector<char> &&>(byte_vector);. To UB and beyond!Poi
M
8

I would probably leave the data in the original vector<byte> and make a small class that keeps a reference to the original vector<byte> and does the necessary casting when you need it.

Example:

#include <cstddef>
#include <iostream>
#include <vector>

template<typename T>
struct char_view {
    explicit char_view(std::vector<T>& bytes) : bv(bytes) {}

    char_view(const char_view&) = default;
    char_view(char_view&&) = delete;
    char_view& operator=(const char_view&) = delete;
    char_view& operator=(char_view&&) = delete;

    // capacity
    size_t element_count() const { return bv.size(); }
    size_t size() const { return element_count() * sizeof(T); }

    // direct access
    auto data() const { return reinterpret_cast<const char*>(bv.data()); }
    auto data() { return reinterpret_cast<char*>(bv.data()); }

    // element access
    char operator[](size_t idx) const { return data()[idx]; }
    char& operator[](size_t idx) { return data()[idx]; }

    // iterators - with possibility to iterate over individual T elements
    using iterator = char*;
    using const_iterator = const char*;

    const_iterator cbegin(size_t elem = 0) const { return data() + elem * sizeof(T); }
    const_iterator cend(size_t elem) const { return data() + (elem + 1) * sizeof(T); }
    const_iterator cend() const { return data() + size(); }

    const_iterator begin(size_t elem = 0) const { return cbegin(elem); }
    const_iterator end(size_t elem) const { return cend(elem); }
    const_iterator end() const { return cend(); }
    
    iterator begin(size_t elem = 0) { return data() + elem * sizeof(T); }
    iterator end(size_t elem) { return data() + (elem + 1) * sizeof(T); }
    iterator end() { return data() + size(); }

private:
    std::vector<T>& bv;
};

int main() {
    using std::byte;

    std::vector<byte> byte_vector{byte{'a'}, byte{'b'}, byte{'c'}};

    char_view cv(byte_vector);

    for(char& ch : cv) {
        std::cout << ch << '\n';
    }
}

Output:

a
b
c

A simpler option if you only need const access could be to create a string_view:

template<typename T>
std::string_view to_string_view(const std::vector<T>& v) {
    return {reinterpret_cast<const char*>(v.data()), v.size() * sizeof(T)};
}
//...
auto strv = to_string_view(byte_vector);
Middleton answered 21/6, 2020 at 23:35 Comment(8)
Off-topic: user-defined byte literals would probably be neater than C style castsPoi
While we're at it, let's generalize this view thing to help spread UB gcc.godbolt.org/z/TR6rsyPoi
@_Static_assert "user-defined byte literals" I totally agree. I tried to keep it short and I couldn't find any pre-defined user-defined byte literals. "let's generalize this view thing to help spread UB" :-) I think that by staying at char we should be fine?Middleton
Sure, reinterpreting as char array should be fine, but where's the fun in that? :)Poi
Is there some good reason why some move/conversion isn't supported in the language? I run into this frequently and it's a PITA.Auk
@Auk what do you mean by move/conversion?Middleton
@TedLyngmo: std::vector<char> v(std::vector<std::byte>&& other) for example, as a move. vector byte buffers are vector byte buffers, whether the template type is char, unsigned char, uint8_t, std::byte, signed char.Auk
@Auk I see. When it comes to unsigned char and uint8_t, you can use them interchangeably (as long as uint8_t exists at all of course). When it comes to the others, they are distinct types so just letting that move through opens up for mistakes. I rarely find this to be an obstacle though. Perhaps you could ask a question about it to get some more in-depth answers?Middleton
C
2

std::vector does not allow attaching or detaching to memory allocations , other than moves from a vector of exactly the same type. This has been proposed but people raised (valid) objections about the allocator for attaching and so on.

The function returning vector<byte> constrains you to work with a vector<byte> as your data container unless you want to copy the data out.

Of course, you can alias the bytes as char in-place for doing character operations.

Confide answered 22/6, 2020 at 2:59 Comment(0)
L
-6

You can achieve this with a cast, as shown below. This is legal because the cast is to a char reference (if casting to any other type it would be UB) but, with gcc at least, you still have to compile it with -fno-strict-aliasing to silence the compiler warning. Anyway, here's the cast:

std::vector <char> char_vector = reinterpret_cast <std::vector <char> &&> (byte_vector);

And here's a live demo

Liquesce answered 21/6, 2020 at 23:2 Comment(2)
flak flak flak.Pironi
Since you're bringing the big guns, why do you even need std::move anymore. Just auto char_vector = reinterpret_cast<std::vector<char> &&>(byte_vector);. To UB and beyond!Poi

© 2022 - 2024 — McMap. All rights reserved.