Is the strict aliasing rule really a "two-way street"?
Asked Answered
D

2

13

In these comments user @Deduplicator insists that the strict aliasing rule permits access through an incompatible type if either of the aliased or the aliasing pointer is a pointer-to-character type (qualified or unqualified, signed or unsigned char *). So, his assertion is basically that both

long long foo;
char *p = (char *)&foo;
*p; // just in order to dereference 'p'

and

char foo[sizeof(long long)];
long long *p = (long long *)&foo[0];
*p; // just in order to dereference 'p'

are conforming and have defined behavior.

In my read, however, it is only the first form that is valid, that is, when the aliasing pointer is a pointer-to-char; however, one can't do that in the other direction, i. e. when the aliasing pointer points to an incompatible type (other than a character type), the aliased pointer being a char *.

So, the second snippet above would have undefined behavior.

What's the case? Is this correct? For the record, I have already read this question and answer, and there the accepted answer explicitly states that

The rules allow an exception for char *. It's always assumed that char * aliases other types. However this won't work the other way, there's no assumption that your struct aliases a buffer of chars.

(emphasis mine)

Downthrow answered 6/7, 2014 at 17:16 Comment(19)
Indeed, only the first version is allowed by the standard.Arezzini
@OliCharlesworth Well, I think so too. Tell that to Deduplicator, who (quite arrogantly) asserts that "I need to learn to read the standard better"... >.<Downthrow
FWIW the relevant C99 standard clause is 6.5 p7.Arezzini
Please add an _Alignas(long long) to the char array, otherwise mis-alignment might cause UB.Reviere
@Kerrek: How would reading characters from a file into a buffer and then using the buffer as a struct whatever (depending on the initial sequence) work then?Reviere
@Deduplicator: If you mean something like char buf[...]; fread(buf, ...); foo(((MyStruct *)buf)->member);, it doesn't work.Arezzini
@Oli: You forgot the alignment specifier. With it, that would work. Also, how would you code an allocator then?Reviere
@Deduplicator: Alignment is another matter. See the standard quote I referenced above for why this isn't valid from an aliasing POV.Arezzini
@Oli: Added all those quotes from the standard I think might be handy for prooving your point, and an example usage which is correct (imho) as an answer. Please proove it wrong.Reviere
@Reviere but it is not just about alignment, it is about optimization as well and what assumptions the compiler is allowed to make. I don't see how any type can be the effective type of char ... whereas char has a special exception carved out for it.Waaf
@Deduplicator: Either don't read into a buffer, but into an actually existing type (this only works for fundamental types), or memcopy individual struct elements afterwards one by one.Amenra
@KerrekSB: Seems I must take care not to use a buffer with a declared type, then I'm ok.Reviere
In the end @OliCharlesworth found the neccessary additional quote to make me see the additional restrictions when not using dynamic allocation. Thanks a lot!Reviere
The long long *p = (long long *)&foo[0]; *p example certainly has a potential for alignment problems. e.g. foo is on an odd address and p may need an even (or quad) address. But is this the "strict aliasing rule" issue? Thought that had to do with #99150Lordship
@chux that is not the entire strict aliasing issue itself, but alignment is part of why the strict aliasing rule exists. It's not the only reason, though (it's about the compiler's ability to optimize based on assumed program invariants too).Downthrow
@chux aliasing and alignment are separate issues. If the access does not meet alignment requirements it is UB. If the access does meet alignment requirements then we consider other things such as aliasing.Sechrist
@Matt McNabb Agree aliasing and alignment are separate issues. Hence my comment as the title is "strict aliasing rule", but the examples appearer more about alignment. Certainly both must be OK for code to work, but I think they can be addresses separately in this post.Lordship
@chux it seems to me that the question is only about aliasing. (The actual code example may also have alignment problems, but the question is just asking about the aliasing issue).Sechrist
@user3477950 note that the accepted answer is actually wrong on the second linked thread. The comment by RMartinhoFernandes sums it up.Sechrist
U
5

You are correct to say that this is not valid. As you yourself have quoted (so I shall not re-quote here) the guaranteed valid cast is only from any other type to char*.

The other form is indeed against standard and causes undefined behaviour. However as a little bonus let us discuss a little behind this standard.

Chars, on every significant architecture is the only type that allows completely unaligned access, this is due to the read byte instructions having to work on any byte, otherwise they would be all but useless. This means that an indirect read to a char will always be valid on every CPU I know of.

However the other way around this will not apply, you cannot read a uint64_t unless the pointer is aligned to 8 bytes on most arches.

However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard. Also note, if you cast a pointer to any type to a pointer to char and then cast it back the resultant pointer is guaranteed to be equal to the original object. Therefore this is ok:

struct x *mystruct = MakeAMyStruct();
char * foo = (char *)mystruct;
struct x *mystruct2 = (struct mystruct *)foo;

And mystruct2 will equal mystruct. This also guarantees the struct is properly aligned for it's needs.

So basically, if you want a pointer to char and a pointer to another type, always declare the pointer to the other type then cast to char. Or even better use a union, that is what they are basically for...

Note, there is a notable exception to the rule however. Some old implementations of malloc used to return a char*. This pointer is always guaranteed to be castable to any type successfully without breaking aliasing rules.

Usia answered 12/8, 2014 at 12:19 Comment(1)
There is no aliasing problem with any pointer cast; the potential probably only arises when dereferencing the result of the cast and then using that expression to access memory. And the behaviour depends on the effective type of the memory, and the type of the dereferencing expression; it is completely independent of the types of any other pointers that might exist or have existed. So the case in your last paragraph actually does not need an exception.Sechrist
O
2

Deduplicator is correct. The undefined behaviour that allows compilers to implement "strict aliasing" optimizations doesn't apply when character values are being used to produce a representation of an object.

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

However your second example has undefined behaviour because foo is uninitialized. If you initialize foo then it only has implementation defined behaviour. It depends on the implementation defined alignment requirements of long long and whether long long has any implementation defined pad bits.

Consider if you change your second example to this:

long long bar() {
    char *foo = malloc(sizeof(long long));
    char c;
    for(c = 0; c < sizeof(long long); c++)
        foo[c] = c;
    long long *p = (long long *) p;
    return *p;
}

Now alignment is no longer issue and this example is only dependent of the implementation defined representation of long long. What value is returned depends on the representation of long long but if that representation is defined as having no pad bits them this function must always return the same value and it must also always be a valid value. Without pad bits this function can't generate a trap representation, and so the compiler cannot perform any strict aliasing type optimizations on it.

You have to look pretty hard to find a standard conforming implementation of C that has implementation defined pad bits in any of its integer types. I doubt you'll find one that implements any sort of strict aliasing type of optimization. In other words, compilers don't use the undefined behaviour caused by accessing a trap representation to allow strict-aliasing optimizations because no compiler that implements strict-aliasing optimizations has defined any trap representations.

Note also that had buf been initialized with all zeros ('\0' characters) then this function wouldn't have any undefined or implementation defined behaviour. An all-bits-zero representation of a integer type is guaranteed not to be a trap representation and guaranteed to have the value 0.

Now for a strictly conforming example that uses char values to create a guaranteed valid (possibly non-zero) representation of a long long value:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv) {
    int i;
    long long l;
    char *buf;

    if (argc < 2) {
        return 1;
    }
    buf = malloc(sizeof l);
    if (buf == NULL) {
        return 1;
    }
    l = strtoll(argv[1], NULL, 10);
    for (i = 0; i < sizeof l; i++) {
        buf[i] = ((char *) &l)[i];
    }
    printf("%lld\n", *(long long *)buf);
    return 0;
}

This example has no undefined behaviour and is not dependent on the alignment or representation of long long. This is the sort of code that the character type exception on accessing objects was created for. In particular this means that Standard C lets you implement your own memcpy function in portable C code.

Ornithopter answered 12/8, 2014 at 15:57 Comment(4)
"Deduplicator is correct." - sorry, no, he wasn't -- in the original (now apparently deleted) comments, he asserted that an array like char c[sizeof(long]; could be reinterpreted and aliased through an lvalue of type long. Since there is no object of type long there, the behavior is undefined. The quote about trap representations does not say anything about this issue; it's about a slightly different issue describing another family of undefined behaviors.Downthrow
I deleted my previous comment as a it was based on a incorrect reading of the standard. You're right that a compiler can assume a dereference of long pointer can't alias array c because it knows the effective type of c. But it can't generally assume a pointer to long and pointer to char can't point to the same object because it generally can't know and can't assume the effective of type of the object char points to. So there is two-way street when the char object doesn't have a declared type, and effectively one when using pointers the compiler can't trace back to a definition.Ornithopter
Your first paragraph sounds as if you're saying OP's second snippet would not be UB if foo were initialized (when in fact it would be UB); could you please clarify that.Sechrist
Related to your code example, there can still be trap representations without padding bits (specific example - negative-zero on a 1's complement or sign-magnitude implementation that doesn't support negative zeroes)Sechrist

© 2022 - 2024 — McMap. All rights reserved.