Does this code violate the strict aliasing rule?
Asked Answered
J

2

17

Questions:

  1. Does this code below violate strict aliasing rules? That is, would a smart compiler be allowed to print 00000 (or some other nasty effect), because a buffer first accessed as other type is then accessed via int*?

  2. If not, would moving just the definition and initializaton of ptr2 before the braces (so ptr2 would be defined already, when ptr1 comes to scope) break it?

  3. If not, would removing the braces (so ptr1 and ptr2 were in the same scope) break it?

  4. If yes, how could the code be fixed?

Bonus question: If the code is ok, and 2. or 3. don't break it either, how to change it so it would break strict aliasing rules (example, convert braced loop to use int16_t)?


int i;
void *buf = calloc(5, sizeof(int)); // buf initialized to 0

{
    char *ptr1 = buf;    
    for(i = 0; i < 5*sizeof(int); ++i)
        ptr1[i] = i;
}

int *ptr2 = buf;
for(i = 0; i < 5; ++i)
    printf("%d", ptr2[i]);

Looking for confirmation, so short(ish), expert answer about this particular code, ideally with minimal standard quotes, is what I am after. I am not after long explanations of strict aliasing rules, only the parts that pertain to this code. And it would be great if an answer would explicitly enumerate the numbered questions above.

Also assume a general-purpose CPU with no integer trap values, and let's also say int is 32 bits and two's complement.

Joyance answered 13/7, 2016 at 9:41 Comment(0)
L
13

No it doesn't, but this is only because the memory was allocated, and written into using a character type.

Memory is allocated using malloc. That object doesn't have declared1 type because it was allocated with malloc. Thus the object doesn't have any effective type.

Then the code accesses and modifies the object using the type char. As the type is2 char and no object having an effective type is copied5, copying doesn't set the effective type to char for this and subsequent accesses, but sets the effective type to char, only for the duration of the access3. After the access, the object doesn't have an effective type anymore.

Then the type int is used to access and only read that object. As the object doesn't have an effective type, it becomes3 int, for the duration of the read. After the access the object doesn't have an effective type anymore. As int was obviously compatible with the effective type int, the behavior is defined.

(Assuming the values read are not trap representation for int.)


Had you accessed and modified the object using a non-character type that is also not compatible with int, the behavior would be undefined.

Let's say your example was (assuming sizeof(float)==sizeof(int)):

int i;
void *buf = calloc(5, sizeof(float)); // buf initialized to 0

{
    float *ptr1 = buf;    
    for(i = 0; i < 5*sizeof(float); ++i)
        ptr1[i] = (float)i;
}

int *ptr2 = buf;
for(i = 0; i < 5; ++i)
    printf("%d", ptr2[i]);

The effective type of the object, when floats are being written into, becomes of type float, for the duration of the write and all subsequent accesses to the object that don't modify it2. When those objects are then accessed by int the effective type remains float, as the values are only being read not modified. The previous write using float set the effective type to float permanently until the next write into this object (which didn't happen in this case). Types int and float are not compatible4, thus the behavior is undefined.


(All text below is quoted from: ISO:IEC 9899:201x)

1 (6.5 Expressions 6)
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) Allocated objects have no declared type.

2 (6.5 Expressions 6)
If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

3 (6.5 Expressions 6)
For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

4 (6.5 Expressions 8)
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88) — a type compatible with the effective type of the object, — a qualified version of a type compatible with the effective type of the object, — a type that is the signed or unsigned type corresponding to the effective type of the object, — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or — a character type.

5 (6.5 Expressions 6)
If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

Lacerate answered 13/7, 2016 at 9:51 Comment(10)
You implicitly raise an interesting point that I hadn't thought of: since the allocated memory isn't assigned to via an int lvalue, has it not yet "become" an int per 6.5 paragraph 6 when it's dereferenced via an int *, thus violating strict aliasing after all? Parsing that paragraph is painful.Benil
At that point only a character was written into the object and that didn't set the effective type to the object. It becomes int when read, because of For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.Lacerate
If at that same point int was being written into the object, instead of read, the type would also become int because: If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.Lacerate
So, if code was changed to char charbuf[5*sizeof(int)] ] = ""; void *buf = charbuf;, and calloc removed, that would also break it?Joyance
@Lacerate I think the "copied as an array of character type" portion of paragraph 6 applies, though, because of the for loop in the question. The "a character type" clause of paragraph 7 is necessary to avoid a violation of strict aliasing.Benil
@Joyance Yes definitely, automatic objects have declared type and effective type cannot be changed. Because of citation 4., int is not compatible with char.Lacerate
@AndrewHenle But we didn't copy from any object: then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.Lacerate
@AndrewHenle I think that "copied as an array of character type" is meant to imply that a value is copied in whole from another object, which isn't the case here (the int value is stored via the character pointer, but the representation is not copied). Unfortunately it is not clearly defined any further in the standard (one of many failings).Albright
@Lacerate But would the for loop be a copy of a series of individual char objects? Thus leading to paragraph 7's "a character type"?Benil
The write of chars sets the object to effective type char only for the duration of the write, not permanently, unlike the float example.Lacerate
B
2

No. This does not violate strict aliasing.

From the C Standard, 6.2.5 Types, paragraph 28:

A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. 48

Note the 48. That refers to footnote 48:

48) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

So you can access the calloc()'d memory via a char * pointer (assuming your ptr is meant to be ptr1) with no problems.

Although that's really extra, since 7.22.3 Memory management functions, paragraph 1 states:

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated

So you can safely access the calloc()'d memory via an int pointer too, as well as a char pointer. And a double pointer to boot (assuming you stay within the bounds of the allocated memory).

Benil answered 13/7, 2016 at 9:56 Comment(11)
Strict aliasing is something else than alignment requirements and representation.Lottie
@JohannesSchaub-litb What the question is referring to is the same thing memset() does (The memset function copies the value of c (converted to an unsigned char ) into each of the first n characters of the object pointed to by s .) If you think it's wrong, post your answer that also accounts for how memset() can set any type of memory to a repeated char value. Are you saying memset() violates strict aliasing?Benil
@AndrewHenle nobody's saying that the code in the question violates strict aliasing rules. However, the rationale for why this is the case given in your answer is not correct. Having the same alignment and representation does not imply that two types may alias.Albright
@Albright But the fundamental reason for strict aliasing - to meet the requirement for "a type compatible with the effective type of the object" - is because of such alignment and representation. How are objects "compatible"?Benil
@AndrewHenle the fundamental reason for strict aliasing is to limit the lvalue types allowed to access an object (mainly to allow certain compiler optimisations, though the standard doesn't state this of course). "compatible types" are defined in 6.2.7 which doesn't claim that any two types with the same alignment and representation are compatible. It's not about whether the pointer is able to point to the object or not; strict aliasing is about whether such a pointer, assuming that it can be constructed, is allowed to be dereferenced.Albright
@AndrewHenle If not fundamental, then at least an important reason for strict aliasing rule is to allow optimizations. Without strict aliasing rule, compiler would have to assume, that almost any assignment to any variable could alter even a non-volatile value a pointer points to (and would have to re-fetch it from memory), and vice versa almost any assignment through a pointer could alter value of any variable. Strict aliasing rule allows compiler to assume that values don't change like this, in the cases it applies.Joyance
(i.e. the "strict aliasing rules" are embodied mostly by 6.5 paragraph 7).Albright
...ever intended that the Standard be used as an excuse by compiler writers to go out of their way not to support useful features and guarantees that had historically been provided on platforms similar to the target, but the authors of gcc seem to believe that any code using behaviors not mandated by the Standard is "broken" even if the behaviors in question had been treated consistently for decades on all platforms even remotely similar to the target.Atrocious
@hyde: That is indeed the purpose. though unless the authors of the standard were being disingenuous with the example given in the rationale I see no reason to believe they intended programmers to jump through hoops to make use of type punning where it was useful and a non-obtuse compiler would recognize that it was likely. Given foo(int *p, int *r, int *s) { float *fp; *r=1; fp = (float*)p; *fp= 0.0f; *s=2; *fp=0.0f; return *r+*s; } I think they would have expected that a compiler might miss the that the second write to *fp might hit an int, but regard the int* to float* cast...Atrocious
...as indicating that any int values whose address might be exposed should be flushed from registers when the typecast occurs. A compiler might not be sophisticated enough to recognize the fact that a cast pointer might continue to exist after subsequent uses of int values, but since the primary use of such typecasts would be to perform "immediate" type punning, a presumption that a cast indicates that immediate aliasing is likely should not be considered excessively pessimistic. Unfortunately, the Standard presumes that compiler writers have enough common sense that...Atrocious
...they won't need the authors of the Standard to state the obvious.Atrocious

© 2022 - 2024 — McMap. All rights reserved.