When and how is conversion to char pointer allowed?
Asked Answered
B

7

26

We can look at the representation of an object of type T by converting a T* that points at that object into a char*. At least in practice:

int x = 511;
unsigned char* cp = (unsigned char*)&x;
std::cout << std::hex << std::setfill('0');
for (int i = 0; i < sizeof(int); i++) {
  std::cout << std::setw(2) << (int)cp[i] << ' ';
}

This outputs the representation of 511 on my system: ff 01 00 00.

There is (surely) some implementation defined behaviour occurring here. Which of the casts is allowing me to convert an int* to an unsigned char* and which conversions does that cast entail? Am I invoking undefined behaviour as soon as I cast? Can I cast any T* type like this? What can I rely on when doing this?

Brawn answered 21/12, 2012 at 19:4 Comment(4)
I don't think it's undefined behavior, at least if you don't modify the data. But the result will depend on whether you platform is little or big endian.Euniceeunuch
Note that this is only safe for char *. Casting pointers to make them read as different types causes problems with aliasing. The C and C++ languages guarantee to the compiler that pointers to different types can never point to the same object so the optimizer can do things like store the value in a register or hoist a load or write out of a loop. char * is the only exception. A char * has to be assumed to alias with anything, because of serialization to and from disk and network buffers.Outfit
@ZanLynx - Re "char* is the only exception": Not quite. The standard also allows conversion to unsigned char*.Joslyn
Also see What is the strict aliasing ruleConcordant
J
20

Which of the casts is allowing me to convert an int* to an unsigned char*?

That C-style cast in this case is the same as reinterpret_cast<unsigned char*>.

Can I cast any T* type like this?

Yes and no. The yes part: You can safely cast any pointer type to a char* or unsigned char* (with the appropriate const and/or volatile qualifiers). The result is implementation-defined, but it is legal.

The no part: The standard explicitly allows char* and unsigned char* as the target type. However, you cannot (for example) safely cast a double* to an int*. Do this and you've crossed the boundary from implementation-defined behavior to undefined behavior. It violates the strict aliasing rule.

Joslyn answered 21/12, 2012 at 19:26 Comment(1)
Aha, so it looks like (from @GeneBushuyev's and @nobar's answers) the cast from T* to any U* has unspecified result (but would be fine if I cast back again) and if I were to cast to anything but a char* or unsigned char* and then access the object though that pointer, I would have undefined behaviour (as per strict aliasing). The perfect answer would have both of these points. ;)Brawn
O
8

Your cast maps to:

unsigned char* cp = reinterpret_cast<unsigned char*>(&x);

The underlying representation of an int is implementation defined, and viewing it as characters allows you to examine that. In your case, it is 32-bit little endian.

There is nothing special here -- this method of examining the internal representation is valid for any data type.

C++03 5.2.10.7: A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.

This suggests that the cast results in unspecified behavior. But pragmatically speaking, casting from any pointer type to char* will always allow you to examine (and modify) the internal representation of the referenced object.

Orris answered 21/12, 2012 at 19:18 Comment(4)
Strictly speaking, though, the standard does not guarantee that a char is smaller than an int.Orris
The relevant standards for the "strict aliasing rule" are provided here: https://mcmap.net/q/16455/-what-is-the-strict-aliasing-rule. Synopsis: If you access the object via char* or unsigned char*, there is no problem.Orris
This is highly tangential, but it is interesting to note that the strict aliasing rule suggests that the use of a char* can interfere with optimization. This is where the non-standard restrict keyword can be useful -- although it doesn't apply to the question at hand, since aliasing is exactly the point of the given question.Orris
Related/of interest: Do the strict aliasing rules in C++20 allow reinterpret_cast between the standard c++ unicode chars and the underlining types?Orris
K
4

The C-style cast in this case is equivalent to reinterpret_cast. The Standard describes the semantics in 5.2.10. Specifically, in paragraph 7:

"A pointer to an object can be explicitly converted to a pointer to a different object type.70 When a prvalue v of type “pointer to T1” is converted to the type “pointer to cvT2”, the result is static_cast<cvT2*>(static_cast<cvvoid*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified."

What it means in your case, the alignment requirements are satisfied, and the result is unspecified.

Kioto answered 21/12, 2012 at 19:26 Comment(1)
Ah, so it's only well-defined when you cast from a T* to a U* and back to a T*? The result of a T* cast to a U* only is unspecified? Aha.Brawn
L
3

The implementation behaviour in your example is the endianness attribute of your system, in this case your CPU is a little endian.
About the type casting, when you cast an int* to char* all what you are doing is telling the compiler to interpret what cp is pointing to as a char, so it will read the first byte only and interpret it as a character.

Lubbi answered 21/12, 2012 at 19:18 Comment(0)
C
1

The cast between pointers are themselves always possible since all pointers are nothing more than memory addresses and whatever type, in memory, can always be thought as a sequence of bytes.

But -of course- the way the sequence is formed depends on how the decomposed type is represented in memory, and that's out of the scope of the C++ specifications.

That said, unless of very pathological cases, you can expect that representation to be the same on all the code produced by a same compiler for all the machines of a same platform (or family), and you should not expect same results on different platforms.

In general one thing to avoid is to express the relation between type sizes as "predefined": in your sample you assume sizeof(int) == 4*sizeof(char): that's not necessarily always true.

But it is always true that sizeof(T) = N*sizeof(char), hence whatever T can always be seen as a integer number of char-s

Chaechaeronea answered 21/12, 2012 at 19:19 Comment(3)
I'm missing where the OP assumed that sizeof(int) == 4*sizeof(char).Communist
@Communist Emilio may have been answering the original version of my question which did rely on it being 4*sizeof(char).Brawn
@sftrabbit - Fine. But delete your comment saying "No need to delete..."Communist
P
0

Unless you have a cast operator, then a cast is simply telling to "see" that memory area in a different way. Nothing really fancy, I would say.

Then, you are reading the memory area byte-by-byte; as long as you do not change it, it is just fine. Of course, the result of what you see depends a lot from the platform: think about endianness, word size, padding, and so on.

Polivy answered 21/12, 2012 at 19:22 Comment(0)
L
0

Just reverse the byte order then it becomes

00 00 01 ff

Which is 256 (01) + 255 (ff) = 511

This is because your platfom is little endian.

Lail answered 21/12, 2012 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.