Twenty plus years ago, I would have (and didn't) think anything of doing binary I/O with POD structs:
struct S { std::uint32_t x; std::uint16_t y; };
S s;
read(fd, &s, sizeof(s)); // assume this succeeds and reads sizeof(s) bytes
std::cout << s.x + s.y;
(I'm ignoring padding and byte order issues, because they're not part of what I am asking about.)
"Obviously", we can read into s
and the compiler is required to assume that the contents of s.x
and s.y
are aliases by read()
. So, s.x
after the read()
isn't undefined behaviour (because s
was uninitialized).
Likewise in the case of
S s = { 1, 2 };
read(fd, &s, sizeof(s)); // assume this succeeds and reads sizeof(s) bytes
std::cout << s.x + s.y;
the compiler can't presume that s.x
is still 1
after the read()
.
Fast forward to the modern world, where we actually have to follow the aliasing rules and avoid undefined behaviour, and so on, and I have been unable to prove to myself that this is allowed.
In C++14, for example, [basic.types] ¶2 says:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.
42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
¶4 says:
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
[basic.lval] ¶10 says:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:54
...
— a char or unsigned char type.
54 The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Taken together, I think that this is the standard saying that "you can form an unsigned char
or char
pointer to any trivially copyable (and thus POD) type and read or write its bytes". In fact, in N2342, which gave us the modern wording, the introductory table says:
Programs can safely apply coding optimizations, particularly std::memcpy.
and later:
Yet the only data member in the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.
With the proposed resolution, the class can be made into a POD by making the default constructor trivial (with N2210 the syntax would be endian()=default), resolving all the issues.
It really sounds like N2342 is trying to say "we need to update the wording to make it so you can do I/O like read()
and write()
for these types", and it really seems like the updated wording was made standard.
Also, I often hear reference to "the std::memcpy()
hole" or similar where you can use std::memcpy()
to basically "allow aliasing". But the standard doesn't seem to call out std::memcpy()
specifically (and in fact in one footnote mentions it along with std::memmove()
and calls it an "example" of a way to do this).
Plus, there's the fact that I/O functions like read()
tend to be OS-specific from POSIX and thus aren't discussed in the standard.
So, with all this in mind, my questions are:
What actually guarantees that we can do real-world I/O of POD structs (as shown above)?
Do we actually need to need to
std::memcpy()
the content into and out ofunsigned char
buffers (surely not) or can we directly read into the POD types?Do the OS I/O functions "promise" that they manipulate the underlying memory "as if by reading or writing
unsigned char
values" or "as if bystd::memcpy()
"?What concerns should I have when there are layers (such as Asio) between me and the raw I/O functions?
void*
doesn't break any rules since it's impossible to access memory through avoid*
. All the compiler knows is that "someone" got the address ofs
and thus the memorys
occupies might get modified. That's all it can know and it doesn't seem like it would break any rules. – Methylnaphthalenevoid*
(orchar*
) is a legal alias, and then as long as you use the memory via the originalS*
orchar*
, you're good Maybe it's that simple. – Nemhauservectors<POD>data()
where the object to object packing has been verified. The problem there is that all data has been initialized. If a compiler decided an object in the vector hadn't been changed by theread
there be dragons. – VariationS
- if c++ garantee thatx
have 0 offset and 4 byte size, andy
- offset 4 and 2 byte size, andsizeof(S) == 8
- you can pass&s
to some external for language api (which can be implemented on another language). – Euchres
and its members via I/O without breaking aliasing rules (and thus without having the compiler ignore the result of the I/O ), and 2) read in some bytes and create a validS
. I think the answers are: 1) as per @Nikos C. it simply doesn't break the aliasing rules, and 2) as per the full answers below, such a type has (at least) standard layout and sane compilers use sane standard layouts. – Nemhauser