Is strict aliasing one-way?
Asked Answered
W

3

4

I believe 6.5p7 in the C standard defines the so-called strict aliasing rule as follows.

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  1. a type compatible with the effective type of the object,
  2. a qualified version of a type compatible with the effective type of the object,
  3. a type that is the signed or unsigned type corresponding to the effective type of the object,
  4. a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  5. an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  6. a character type.

Here's a simple example that shows GCC's optimization based on its assumption to the rule.

int IF(int *i, float *f) {
    *i = -1;
    *f = 0;
    return *i;
}

IF:
        mov     DWORD PTR [rdi], -1
        mov     eax, -1
        mov     DWORD PTR [rsi], 0x00000000
        ret

The load for return *i is omitted assuming that int and float cannot alias.

Then let's consider case 6, where it says an object could be accessed by a character type lvalue expression (char *).

int IC(int *i, char *c) {
    *i = -1;
    *c = 0;
    return *i;
}

IC:
        mov     DWORD PTR [rdi], -1
        mov     BYTE PTR [rsi], 0
        mov     eax, DWORD PTR [rdi]
        ret

Now there is a load for return *i because i and c could overlap according to the rules, and *c = 0 could change what's in *i.

Then can we also modify a char through an int *? Should the compiler care that such thing might happen?

char CI(char *c, int *i) {
    *c = -1;
    *i = 0;
    return *c;
}

CI: #GCC
        mov     BYTE PTR [rdi], -1
        mov     DWORD PTR [rsi], 0
        movzx   eax, BYTE PTR [rdi]
        ret

CI: #Clang
        mov     byte ptr [rdi], -1
        mov     dword ptr [rsi], 0
        mov     al, byte ptr [rdi]
        ret

Looking at the assembly output, both GCC and Clang seem to think a char can be modified by access through int *.

Maybe it's obvious that A and B overlapping means A overlaps B and B overlaps A. However, I found this detailed answer which emphasizes in boldface that,

Note that may_alias, like the char* aliasing rule, only goes one way: it is not guaranteed to be safe to use int32_t* to read a __m256. It might not even be safe to use float* to read a __m256. Just like it's not safe to do char buf[1024]; int *p = (int*)buf;.

Now I got really confused. The answer is also about GCC vector types, which has an may_alias attribute so it can alias similarly as a char.

At least, in the following example, GCC seems to think overlapping access can happen in both ways.

int IV(int *i, __m128i *v) {
    *i = -1;
    *v = _mm_setzero_si128();
    return *i;
}

__m128i VI(int *i, __m128i *v) {
    *v = _mm_set1_epi32(-1);
    *i = 0;
    return *v;
}

IV:
        pxor    xmm0, xmm0
        mov     DWORD PTR [rdi], -1
        movaps  XMMWORD PTR [rsi], xmm0
        mov     eax, DWORD PTR [rdi]
        ret
VI:
        pcmpeqd xmm0, xmm0
        movaps  XMMWORD PTR [rsi], xmm0
        mov     DWORD PTR [rdi], 0
        movdqa  xmm0, XMMWORD PTR [rsi]
        ret

https://godbolt.org/z/ab5EMx3bb

But am I missing something? Is strict aliasing one-way?


Additionally, after reading the current answers and comments, I thought maybe this code is not allowed by the standard.

typedef struct {int i;} S;
S s;
int *p = (int *)&s;
*p = 1;

Note that (int *)&s is different from &s.i. My current interpretation is that an object of type S is being accessed by an lvalue expression of type int, and this case is not listed in 6.5p7.

Wizardly answered 25/5, 2022 at 17:35 Comment(20)
The rule is definitely one-way, there are real-life examples of compilers breaking code that points an int * into an actual __m256i object, like GCC AVX _m256i cast to int array leads to wrong values. But you're using a __m128i * pointer to point at memory that's allowed to be a different underlying type. Note that what you quoted from my answer gave a char buf[1024] example, a char-array object, no char* involved. (Accessing it may involve char* due to how buff[i] works as *(buff+i), so that may be safer, unlike __m128i)Absorbefacient
I'll update my linked answers to include that real-world breakage example.Absorbefacient
@PeterCordes Similarly to the example in your link that breaks, would struct {int i;} s = {1}; *(int *)&s = 0; also possibly break? I know s and s.i must be in the same memory location if it is in memory, but the rules say it's only possible to access an int through struct {int i;} *, and not the other way.Wizardly
Real compilers definitely allow that, and I think even ISO C allows you to derive a pointer to a member from a pointer to the whole aggregate, as long as you get the math right (the correct offset). So *(int *)&s = 0; is definitely fine. The int member of a struct is an int object, so that's allowed by rule (1) in your quote.Absorbefacient
The other way is more interesting, int i = 1; / *(struct s*)&i = {0}; I think that's what (5) is allowing, so unless there's a separate problem in the pointer casting, that may work. It may also work to point a struct{int i[2];}; at something declared as int arr[2], but that feels even weirder.Absorbefacient
@PeterCordes The int member is an int object, but that's different from a struct {int i;} object. By *(int *)&s = 0;, you are accessing a struct {int i;} through int *. Not sure if that's okay.Wizardly
The int i struct member is an int object. That's what makes it safe to pass &s.i to things that want an int*, and why that doesn't need any casting. The struct object and int member fully overlap in this case.Absorbefacient
@PeterCordes *(int *)&s = 0; is the same as typedef struct {int i;} S; S s; int *p = (int *)&s; *p = 0. See that (int *)&s is different from &s.i. It's kind of an artificial example, but I want a clear understanding of the rules.Wizardly
@PeterCordes My current interpretation is that by int *p = (int *)&s; *p = 0;, an object of type S is being accessed by an lvalue expression of type int.Wizardly
Like I said, if there's any problem in deriving an int* by casting a struct*, it's not strict-aliasing. C defines enough about how addresses and memory works that there definitely is an int object in there somewhere, at some address between &s and ((char*)&s) + sizeof(s) - sizeof(int). If your implementation doesn't put padding before the first int member, then it's correct. (I think padding might be allowed, but on implementations that choose not to do that, I'm pretty sure everything is well-defined behaviour even in pure ISO C.)Absorbefacient
There are rules about deriving pointers to sub-objects (e.g. to struct or array member), as opposed to subtracting the address of two different objects and then adding that to one of them. Deriving a pointer to a sub-object is allowed, that's why C offsetof is a thing and is implementable and usable.Absorbefacient
@PeterCordes The address being the same is not the only issue because if so, __m256i v = ...; int *p = (int *)&v; wouldn't be any problem because &v is clearly assigned to p. But as you know the compiler lets garbage to be loaded.Wizardly
__m256i v doesn't have an int member sub-object at that address, so point (1) of the strict-aliasing rule doesn't apply. Of course you have to respect strict-aliasing as well as other rules for deriving pointers.Absorbefacient
@PeterCordes I see. Although I still cannot make some clear logical sense by the sentences of the standard, I think I get what the standard committee's intention is, and how GCC has implemented.Wizardly
@PeterCordes Just to clarify, does the may_alias attribute for vector types imply that a vector object can exist in any contiguous array of any type? In the way that there are 4 char's existing in a 32-bit int but an int doesn't exist in an array of 4 chars? I think this is the point you were trying to explain?Wizardly
I don't think it's useful to think of an int arr[256] as also being composed of char or __m256i objects. Just that you can use pointers of that type to access the bytes of other objects. (The "object-representation"). Including a struct { char c; short s[7];} including padding. The mental model you suggest could I guess work. It gets hairy when you consider a __attribute__((aligned(1),may_alias)) type (like you might use as an alternative to memcpy to do an unaligned aliasing-safe load or store of uint32_t to any offset of a char array). So there are overlapping objects...Absorbefacient
@PeterCordes In case of __attribute__((aligned(1), may_alias)) uint32_t (u32), there'd be a u32 in 0:3, 1:4, 2:5, and so on. Anyway in my personal opinion, strict aliasing doesn't help much unless someone decides to write Java in C (GTK?). But such codebases won't be performance critical, and performance critical cases like OS or SIMD optimized computation often half-ignore aliasing.Wizardly
@PeterCordes Could you have a look at these examples? I made some short examples where __m128i v; (int *)&v and (long *)&v breaks while (long long *)&v works. So, if you're trying to access a vector of long long objects by int *, that's clearly not allowed. However, it seems GCC treats char very specially that char ca[16]; (int *)&ca; doesn't break in a similar use case, even with unaligned access. See the last two functions.Wizardly
Undefined behaviour doesn't mean "guaranteed to break". It can easily happen to work; that doesn't prove anything. (Although the long long* case is interesting, and maybe isn't a coincidence that a type matching the vector works as expected, including doing all loads first before either call. It might be interesting to test that with typedef short v8si __attribute__((vector_size(16))); and see if short* is the only type that works as expected with it.Absorbefacient
@PeterCordes: Under the abstraction model used in Dennis Ritchie's language, every region of addressable storage simultaneously contains, throughout its lifetime, objects of every type that could fit therein, given their size and alignment constraints, but when N1570 6.5p6 and 6.5p7 use the term "object" they must mean something else, but it's not clear what.Cancer
C
3

Yes it's only one way, but from the context of the function it can't tell from which side.

Given this:

char CI(char *c, int *i) {
    *c = -1;
    *i = 0;
    return *c;
}

It could have been called like this:

int a;
char *p = ((char *)&a) + 1;
char b = CI(p,&a);

Which is a valid use of aliasing. So from inside of the function, *i = 0 is correctly setting a in the calling function, and *c = -1 is correctly setting one byte inside of a.

Collation answered 25/5, 2022 at 17:41 Comment(2)
If you don't mind, please have a look at the added part of my question.Wizardly
The Standard was written at a time when it would have been essentially impossible for compilers not to offer such semantics, and thus there was no need to mandate them. Today, however, compilers go out of their way to use whole-program optimization to avoid performing any reloads not mandated by the Standard.Cancer
F
3

You can take a pointer to any object, cast it to a char* and use that to access the bit patterns underlying said object. You can also cast char* gotten this way back to it's original type.

So when the compiler sees int *i and char *p it can not exclude the possibility that p was created by casting from i. So they may point to the same raw memory. Changing one may change the other. There it goes both ways. But that is not what the text is about.

What this is about is casting from A* to char* and then to B*. The object pointed to doesn't magically become a B and accessing it through a B* is undefined behavior. Maybe one-way is the wrong word. I don't know what to name this better. But for every object there is a train with only 2 stops: A* and char* (unsigned char*, signed char*, const char*, ... and all it's variants). You can go back and forth as many times as you like but you can never change tracks and go to B*.

Does that help?

The may_alias attribute sets up another such rail system. Allowing the alias between int[4] and __m128i* because that is exactly the overlapping the compiler needs for the vectorization. But that's something you have to look up in the compilers specs.

Fibrilla answered 25/5, 2022 at 17:56 Comment(35)
This question is also asking about __m128i which is defined as typedef long long __m128i __attribute__((may_alias,vector_size(16))). So that's another type in addition to char* you can cast through. And IMO the key point it's missing (which I was making in the parts of my answers it quoted) as about having an actual declared __m128i vec object, and pointing other pointer types into it. Not just dereferencing pointers. If the object is anonymous and only existed via pointer derefs, then pointing an int* at it is safe if that's the only type other than char* and __m128i*.Absorbefacient
You can point an int* at it but you can't do anything with it. There can't be any aliasing because you can't access what the int* points to.Fibrilla
I meant create and dereference a pointer, in limited comment space. I should have been more specific since your answer is focusing on casting to one pointer type and then back. I was mostly looking at the part of the question that misinterpreted a quote from one of my answers.Absorbefacient
And that casting back is the crucial point. An int* and a __m128i* can never alias since you can't legally get from one to the other. Both can go to char* but no further. Each can only go back to it's original type. So if you ever get int* and __m128i* to point at the same address that you already violated the standard and have UB and we don't care what happens then.Fibrilla
Your answer definitely helped me understanding what's going on, but I'm still not clear about cases such as struct {int i;} s = {1}; *(int *)&s = 0;. I know s and s.i must be in the same memory location if it is in memory, but the rules say it's only possible to access an int through struct {int i;} *, and not the other way, so is the example possibly broken?Wizardly
@GoswinvonBrederlow: As I said in my first comment, __m128i is a "may_alias" type; it's explicitly allowed to point a __m128i* at an int arr[4] (or a short arr[8], or a struct or anything else). This required behaviour is part of Intel's intrinsics API; any compiler providing __m128i* must support that somehow. The may_alias attribute is how GNU C compatible compilers do it; MSVC allows all aliasing. See Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?Absorbefacient
@Wizardly I would always write &s.i instead of (int *)&s to get the int because that makes it clear what I'm pointing at. It's easier to read and if I ever re-arange the members of the struct the &s.i will still work. But (int *)&s seems to be perfectly legal and the compiler knows that &s and &s.i are the same address and allows this. (float*)&s on the other hand says: Warning: dereferencing type-punned pointer will break strict-aliasing rules. But no idea where that is codyfied in the standard.Fibrilla
@GoswinvonBrederlow: The same thing goes for all __m... vector types, including pointing float vectors at integer data.Absorbefacient
@PeterCordes sorry, yes you are right. Bad choice of type on my part. I've extended my answer.Fibrilla
@GoswinvonBrederlow "casting from A* to char*" --> to unsigned char * has the advantage that there is never any trap, nor padding and implementation defined behavior is out of range assignments.Chess
@chux-ReinstateMonica All the char pointer variations work, std::byte too and is the recommended way I believe.Fibrilla
@GoswinvonBrederlow Note post is tagged [c] so std::byte doe snot well apply here. All the char pointer variations do not work, less common ones have trouble, hence the suggestion.Chess
@chux-ReinstateMonica: Have there ever been any non-two's-complement systems where loading a signed character would be faster than loading an unsigned one? If any such systems existed, people familiar with the architecture would be better qualified than the Committee to judge the pros and cons of having a char type with fewer distinct values than unsigned char, and if none ever existed any time spent by the Committee trying to decide how one should behave would be wasted. The question shouldn't matter for anyone whose code would never run on such a system, however.Cancer
@Cancer The problem with char is that it can be signed or unsigned. Nothing to do with two's-complement.Fibrilla
@GoswinvonBrederlow: The Standard would not forbid a ones'-complement implementation from having char default to signed, but unless any implementations actually did so, or there was some plausible reason to expect that a future implementation might do so, any effort spent dealing with the possibility of such implementations would be wasted. Note that on ones'-complement systems, conversion of a short signed type to unsigned and back would yield "surprising" behaviors no matter how an implementation chose to process it.Cancer
@Cancer We already have system where char is signed and systems where is is unsigned. All with two's-complement. And any arithmetic with char will first get promoted to int and then you get values -128 to 127 or 0 to 255 depending on the architecture. That is the problem. That a one's-complement system with signed char would have -127 to -0 to 0 to 127 doesn't add anything.Fibrilla
@GoswinvonBrederlow: On a two's-complement system, whether char is signed or unsigned, converting a char value to either int or unsigned int, performing a bitwise operation with a signed or unsigned value in the range 0..CHAR_MAX, and then converting it back to char will yield the same value whether whether some or all of the values involved were unsigned. On a one's-complement system, that would hold if char is unsigned, but not if it's signed. If char is signed, bitwise operators will behave weirdly.Cancer
@Cancer Except it will not be in that range because char can be signed or unsigned.Fibrilla
@GoswinvonBrederlow: The char type can only be signed on a particular platform if somebody has written a compiler for that platform where char is signed. Have any such compilers ever existed for ones'-complement systems, or is anyone ever going to write one? If not, then char can't possibly be signed on such systems.Cancer
gcc is such a compiler and clang. They even have command line arguments to override the platforms native signedness of char to something else if you don't like the native one. As said depending on the platform char is signed or unsigned. It's a real things that happens with two's-complement while one's-complement is so dead c++ is dropping it and stdint.h doesn't support it on C either.Fibrilla
@GoswinvonBrederlow: Have gcc and clang ever supported ones'-complement platforms?Cancer
@supercat: GCC documents that signed integer types use 2's complement, as its choice for that C implementation-defined behaviour. (gcc.gnu.org/onlinedocs/gcc/…). I assume this goes back pretty far; the git/svn history for that file goes back to 2004 (gcc.gnu.org/git/…), when the text was moved from extend.texi. Following that back, it said the same thing in 2002, but GCC was new in the 1980s. Still, I wouldn't be surprised if GCC has only ever had 2's complement.Absorbefacient
@PeterCordes: That returns then to my original question: has anyone ever written a compiler for any ones'-complement system where char was a signed type? Not all combinations of things that would be allowable under the Standard make sense, and the Standard generally doesn't bother to forbid nonsensical combinations of things that could make sense individually. The fact that C17 and before allow char to be signed, and allow ones'-complement, does not imply that they should be read as demanding that portable code accommodate a ones'-complement dialects where char is signed.Cancer
@PeterCordes: I'm somewhat curious about what advantage there is to having C recognize specific numeric representations, versus specifying that implementations must specify whether or not their numeric representations have certain traits, such as all-bits-zero representing zero, zero always being represented by all-bits zero, positive and negative ranges being symmetric or off-by-one symmetric, having power-of-two mod wrapping, etc. I can imagine designs where it would be advantageous to specify the range of long long as -0x7FFFFFFEFFFFFFF to 0x7FFFFFFFFFFFFFFF...Cancer
...so as to allow all values whose upper word is 0x80000000 to be treated as a NaN (signed overflows would yield such values, and operations using such values would yield such values as results). If a 32-bit processor had arithmetic instructions with such semantics, that would make it very easy for such an implementation to specify overflow behavior in a way that would make it easy to ensure that no undetected overflows could produce seemingly-valid results. Requiring that long long be symmetric, however, rather than merely requiring a warning macro if it isn't, would preclude such a design.Cancer
@Cancer Why you could build a processor following that logic it would be far from optimal. There would be no benefit for it to offset the extra cost involved. You are reserving a huge chunk of the range of long long with only one meaning: invalid. You can do that with far fewer reserved bit patterns.Fibrilla
@GoswinvonBrederlow: For a 32-bit processor, there would be one reserved bit pattern: 0x80000000, and having separate instructions for normal and sticky-NaN arithmetic would essentially as useful as having overflow-trapping and non-trapping instructions, but much cheaper. Having the behavior of the lower word of addition, subtraction, or multiplication depend upon the value of the upper bit would make all such operations much slower than having only the upper word processed in "special" fashion.Cancer
@Cancer -0x7FFFFFFEFFFFFFF leaves a much larger reserved space. Or is that a typo? It sounds like you want one's-complement or sign+magnitude with the exception that -0 stands for NaN. It's probably irrelevant in modern CPU monsters but it does add a lot of gates to adders. A lot more than connecting the overflow bit coming from the adder to the trap bit with an AND gate. It's just in comparison for multiplication or, even more so, division units that the extra cost becomes relatively small. It's on par with saturating add/sub and some CPUs do have that.Fibrilla
@GoswinvonBrederlow: For applications that would require that integer overflow not occur silently, the logic required to have an integer overflow, or any computation with an operand of 0x80000000, force a result of 0x80000000 would be comparable to the cost of saturating addition, and less than the cost required to perform an in-order trap while offering more useful semantics (e.g. a control system which is supposed to compute a value several ways using different inputs could treat computations from inputs that overflow as invalid, using other computations, more easily than if overflows trap.Cancer
@Cancer An overflow trap is really the simplest thing to generate in hardware. It's horrible when you want to catch it in software and for example turn it info the next carry for a long addition. But generating it is way way cheaper (in gate count) than saturated addition or the sticky 0x80000000.Fibrilla
@GoswinvonBrederlow: Overflow traps are simple in a sequential-execution machine. They get much more complicated if one adds in out-of-order execution, and more complicated still if one adds in speculative execution. Forcing the output of an operation to 0x80000000 if an overflow occurs or either input is 0x80000000 could be done in a manner completely agnostic to execution order.Cancer
@GoswinvonBrederlow: I see no reason that having overflow force the bit pattern to 0x80000000 would be any harder than saturating arithmetic--something that some platforms implementations already support, since there's only one bit pattern to force rather than two, and having all operations involving an 0x80000000 operand treated as overflowing would simply require detecting that pattern in parallel with performing the rest of the arithmetic. There's a chicken-and-egg problem between language and hardware semantics, but having an integer NaN concept analogous to floating-point NaN would...Cancer
...offer essentially the same advantages. Requiring that every integer NaN have the same bit pattern would mean that computations involving two-word types would need to be performed in full before writing either word of the result, rather than allowing e..g someLongLong++ to be performed by incrementing the bottom word, testing for zero, and writing it, and then only bothering to even load the top word if a carry was generated from the bottom word.Cancer
@Cancer The difference between saturating math and NaN is that saturating math isn't sticky. 0x40000000 + 0x40000000 - 0x40000000 gives 0x40000000 with saturating math but should give NaN. You have to check both inputs for NaN and hen force the output. In gate counts that's roughly 3 times the cost. You see the same problem with your someLongLong example. Suddenly you have to check all words of the someLongLong before doing ++. For big integer or arbitrary precision math you probably want to use a control header containing the sign bit, a NaN bit, size and whatever and a magnitude.Fibrilla
@GoswinvonBrederlow: If any value whose upper word is 0x80000000 is a NaN, then there is no need to check all the words before doing a ++, since performing an increment on values 0x80000000:00000000 through 0x80000000:FFFFFFFE without looking at the upper word would naturally yield results where the high word was 0x80000000. Incrementing 0x80000000:FFFFFFFF would cause the bottom half to be reset to zero while the upper half was "sticky-NaN" incremented yielding 0x80000000.Cancer
C
2

To understand how the "Strict Aliasing Rule" applies in any particular situation, one must define two concepts which are referenced in N1570 6.5p7 but not actually defined within the Standard:

  1. For purposes of N1570 6.5p7, under what circumstances is a region of storage considered to contain an object of any particular type? In particular for your use case, what does it mean for something to be 'copied as an array of character type'?

  2. What does it mean for an object to be accessed "by" an lvalue of a particular type?

There has never been a consensus as to how those concepts should be specified, thus making it impossible to for anyone to know the rules "mean"(*). The Standard seems to be intended to unambiguously support scenarios where a region of storage is created via malloc() or other such means, then written exclusively using character types, and then accessed via one other type, or those in which storage is written exclusively using one non-character type and then read exclusively via character types, but other scenarios are a bit murkier.

More significantly, while clang and gcc support those scenarios using character types, the sets of scenarios accommodated by clang and gcc omit some corner cases where the Standard is unambiguous, but which don't fit the abstraction model used by clang and gcc. Regardless of what the rules say, programmers should expect that the -fstrict-aliasing dialects of clang and gcc do not accommodate the possibility that storage which has ever been accessed via any non-character type might be accessed by any other within its lifetime, even if storage is always read using the last type with which it was written.

(*) In fairness to the authors of the Standard, a construct like:

unsigned test(float *fp) { return *(unsigned*)fp; }

would be equally usable on an implementation that ignores the possibility that the access via the pointer might affect something of type float but is agnostic as to how the pointer's target storage might be used outside the function, or on an implementation that does more detailed flow analysis but notices that the pointer value being dereferenced is derived from a float*. Unfortunately, if the Standard were to recognize that quality implementations should answer the second question at least as broadly as the first, that might be seen as implying that the authors of clang and gcc have been demanding the right to produce poor quality implementations.

Cancer answered 25/5, 2022 at 22:57 Comment(2)
"Under what circumstances is a region of storage considered to contain an object of any particular type?" After looking at GCC's assembly output for several cases of aliasing, I think GCC considers that an object of a particular type exists at a certain location accessible through a valid pointer, after an assignment by an lvalue expression of that type (A), and this can be overwritten by another assignment by an lvalue expression of a different type (B), after which one cannot safely assume that an object of type A still exist in the same area.Wizardly
@xiver77: So far as I can tell, both clang and gcc operate under a model where an action that can be shown to cause a region of storage to hold a bit pattern it has held at some earlier moment in time may cause the Effective Type of the storage to revert to the earlier type, even if the bit pattern was written by a different type.Cancer

© 2022 - 2024 — McMap. All rights reserved.