What is the no-undefined-behavior way of deserializing an object from a byte array in C++11 (or later)?
Asked Answered
B

2

11

To overcome alignment issues, I need to memcpy into a temporary. What type should that temporary be? gcc complains that the following reinterpret_cast will break strict aliasing rules:

template <typename T>
T deserialize(char *ptr) {
    static_assert(std::is_trivially_copyable<T>::value, "must be trivially copyable");
    alignas(T) char raw[sizeof(T)];
    memcpy(raw, ptr, sizeof(T));
    return *reinterpret_cast<T *>(raw);
}

(e.g. when T is "long").

I don't want to define a T, since I don't want to construct a T before overwriting it.

In a union, doesn't writing one member then reading another count as undefined behavior?

template<typename T>
T deserialize(char *ptr) {
    union {
        char arr[sizeof(T)];
        T obj;
    } u;

    memcpy(u.arr, ptr, sizeof(T));   // Write to u.arr
    return u.obj;   // Read from u.obj, even though arr is the active member.
}
Byssinosis answered 2/9, 2016 at 14:12 Comment(4)
Have some cereal and don't worry about it.Introvert
I have a language lawyer answer on pointers and aliasing, but as a caveat this is a grey area in the Standard. It's supposed to be improved in the future, but I have no idea in what direction. Your program can be tweaked to arguably follow the rules to the letter, but I can't say whether compilers will agree or not. At the very least GCC doesn't complain anymore, but that may just be because we confused its aliasing analysis. Sadly I don't have the time to make a proper answer.Halfcock
@LucDanton: I don't think even std::launder provides the facility the OP desires.Heliometer
@KerrekSB yeah, that’s for a different set of rules. If or when the Standard decides to be conservative and clamp down on the 'creative' interpretations that the current wording allows, then there is indeed nothing to salvage.Halfcock
H
6

What you want is this:

T result;
char * p = reinterpret_cast<char *>(&result);   // or std::addressof(result) !

std::memcpy(p, ptr, sizeof(T));                 // or std::copy!!

return result;

No aliasing violation. If you want a T, you need to have a T. If your type is trivially copyable, then hopefully it is also trivially constructible and there is no cost. In any event, you have to copy the return operand out into the function return value, and that copy is elided, so there's really no extra cost here.

Heliometer answered 2/9, 2016 at 14:19 Comment(5)
The question asked explicitly about doing it without constructing a T.Introvert
Is there any point in casting the address of result to char*?Clatter
@user2079303: Oh, not for memcpy I suppose. But that way it works with std::copy, too.Heliometer
I wouldn't say the question is wrong, I'd say the correct answer is "this is not possible". The answers should adapt to the question, not the other way around.Introvert
Does anyone have a reference to the parts of the standard that disallow both of the options I posted? In particular, I suspect that if one of the union elements is a char array, then you can write to that one and access the other one, because char arrays are handled specially, much like they are for pointer casts. But I can't seem to find it in the spec.Byssinosis
B
0

You want to use std::aligned_storage class template. It's been designed to handle this exact problem. Here's a sample solution with some SFINAE based upon your check in your question.

template<class T>
typename std::enable_if<std::is_trivially_copyable<T>::value, T>::type deserialize(const char *data) {
    typename std::aligned_storage<sizeof(T), alignof(T)>::type destination;
    std::memcpy(&destination, data, sizeof(T));
    return reinterpret_cast<T &>(destination);
}
Bolter answered 2/9, 2016 at 15:40 Comment(9)
That's still UB, right? The purpose of aligned_storage is a different one.Heliometer
It shouldn't be. open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdfBolter
Why would one return reference to non-const temporary whilst return type is not reference? : reinterpret_cast<T &>(destination) -> TFayefayette
This causes undefined behaviour. You have not created any T objects at any point. (memcpy and reinterpret_cast do not create objects). See here, including the accepted answerPentarchy
@Fayefayette the intent is to initialize the result object by copying from a T object at the location of destination. (But causes UB since there is no such object)Pentarchy
@Pentarchy I guess fortunately we now have a new defense lawyer in c++17 named std::launder. If one modifies the return statement to something like return *std::launder(reinterpret_cast<T*>(&destination) );, I'm positive that our use case would be in a good shape, if and only if the address of the argument contains an object of T when it's called. Moreover, the alias retriction rules do allow accessing the byte representation of an object, do they? So std::memcpy can not be considered UB what so ever, it blindly accesses byte representation.Fayefayette
@Pentarchy With std::launder magic, perhaps we can do even this one-liner: return *std::launder(reinterpret_cast<T*>(data) );.Fayefayette
@Fayefayette launder does not get around strict aliasingPentarchy
std::aligned_storage is deprecated in C++ 23: see en.cppreference.com/w/cpp/types/aligned_storage and github.com/cplusplus/papers/issues/197Equiponderance

© 2022 - 2024 — McMap. All rights reserved.