As a preface, this program will not necessarily run exactly like how it does in the question as it exhibits implementation-defined behavior. In addition to this, tweaking the program slightly can cause undefined behavior as well. More information on this at the end.
The first line of the main
function defines an unsigned long foo
as 506097522914230528
. This seems confusing at first, but in hexadecimal it looks like this: 0x0706050403020100
.
This number consists of the following bytes: 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00
. By now, you can probably see its relation to the output. If you're still confused about how this translates into the output, take a look at the for loop.
for (int i = 0; i < sizeof(unsigned long); ++i)
printf("%u ", *(((unsigned char *) &foo) + i));
Assuming a long
is 8 bytes long, this loop runs eight times (remember, two hex digits are enough to display all possible values of a byte, and since there are 16 digits in the hex number, the result is 8, so the for loop runs eight times). Now the real confusing part is the second line. Think about it this way: as I previously mentioned, two hex digits can show all possible values of a byte, right? So then if we could isolate the last two digits of this number, we would get a byte value of seven! Now, assume the long
is actually an array which looks like this:
{00, 01, 02, 03, 04, 05, 06, 07}
We get the address of foo
with &foo
, cast it to an unsigned char *
to isolate two digits, then use pointer arithmetic to basically get foo[i]
if foo
is an array of eight bytes. As I mentioned in my question, this probably looks less confusing as ((unsigned char *) &foo)[i]
.
A bit of a warning: This program exhibits implementation-defined behavior. This means that this program will not necessarily work the same way/give the same output for all implementations of C. Not only is a long 32 bits in some implementations, but when we declare the unsigned long
, the way/order in which it stores the bytes of 0x0706050403020100
(AKA endianness) is also implementation-defined. Credit to @philipxy for pointing out the implementation-defined behavior first. This type punning causes another issue which @Ruslan pointed out, which is that, if the long
is casted to anything other than a char *
/unsigned char *
, C's strict aliasing rule comes into play and you will get undefined behavior (Credit of the link goes to @Ruslan as well). More detail on these two points in the comment section.
long
like some sort of array by converting its address to achar *
and dereferencing it? – Coursonchar *
or anunsigned char *
can access any byte in your address space that you're allowed to access. – Lenientchar*
. That's how stuff likememcpy
can copy any object (logically 1 char at a time, in practice with wider loads/stores.) And one way to write code that serializes data into a byte-stream (with native endianness.) – Psittacine_mm_loadu_si128( pointer )
- likechar*
accesses, they can safely access anything without violating strict-aliasing rules. Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?) – Psittacinechar
s andchar *
s are definitely useful. – Coursonchar*
is allowed to read any other type of object without triggering Undefined Behaviour (because of a special exception for it andunsigned char*
in the strict aliasing part of the ISO C standard). Note that the reverse is not true; usingunsigned long*
to read through achar buf[]
is still UB. (see Why does glibc's strlen need to be so complicated to run quickly? for a way to get around that with GNU C__attribute__((may_alias))
on a typedef, or using memcpy) – Psittacinechar *
to along *
is normally undefined behavior though; achar
is one byte and along
is 8 (or sometimes 4), so unless achar[]
size is a multiple of 8, you would end up getting some bytes that were not originally part of thechar
. And I guess achar *
is safe from strict-aliasing because ultimately, everything is made out of bytes. I don't think you can store values in nybbles or anything smaller than a byte in modern systems. – Courson_Alignas(long) char buf[sizeof(long)];
is guaranteed to be exactly the same size as along
(and sufficiently aligned), but it's still not safe to point along*
at it and load from it. You can safely do the exact same type-punning in C99 using a union. It's just a quirk of C and C++ that pointer-casting type punning is automatically UB except for the special case ofchar
/unsigned char
; some other languages are different. – Psittacine_Alignas
before this. – Coursonsizeof(char)
is 1 by definition. You could imagine a 4-bit CPU architecture where satisfying the C requirement for the value-range ofunsigned char
might require 2 separate 4-bit registers / memory locations to be grouped together as achar
by an ISO C implementation... But that's not practical; 8-bit bytes are standard these days, and smaller was rare historically. Related: Can modern x86 hardware not store a single byte to mem? – Psittacine-fno-strict-aliasing
. MS even recommends*(float*)&my_int32
as a way to type-pun an int holding a bit-pattern into a float. (Their compiler optimizes memcpy ok, I think, so writing non-portable crap like that just locks you in to continuing to use MSVC, with no benefit in the resulting asm. Although it is compact, only C++20std::bit_cast
is more readable.) Always remember that a specific C implementation can choose to define any behaviour that ISO C leaves undefined. – Psittacinestrlen
or whatever in C using bithacks. OS kernels are often compiled with-fno-strict-aliasing
because they tend to want to mess around with the same memory different ways, and often aren't careful to do it only usingmemcpy
,char*
, or GNU C__attribute__((may_alias))
typedefs. Strict aliasing can let a compiler optimize better sometimes, e.g. knowing that anint*
store definitely won't change what's read from afloat*
. – Psittacinedouble
, the compiler shouldn't need to worry if lvalue access to that double somehow made changes to the value of some external linkageint
visible in the same translation unit. A very sound rationale. Then it all went haywire when people started to apply those same rules to integers of different size. And partially accessing an integer through a smaller type is a very common use-case, particularly in hardware-related programming. So these rules remain broken. – Mismate