Does giving data an effective type count as a side-effect?
Asked Answered
H

2

7

Suppose I have a chunk of dynamically allocated data:

void* allocate (size_t n)
{
  void* foo = malloc(n);
  ...
  return foo;
}

I wish to use the data pointed at by foo as a special type, type_t. But I want to do this later, and not during allocation. In order to give the allocated data an effective type, I can therefore do something like:

void* allocate (size_t n)
{
  void* foo = malloc(n);
  (void) *(type_t*)foo;
  ...
  return foo
}

As per C11 6.5/6, this lvalue access should make the effective type type_t:

For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

However, the line (void) *(type_t*)foo; contains no side effects, so the compiler should be free to optimize it away, and I wouldn't expect it to generate any actual machine code.

My question is: are tricks like the above safe? Does giving the data an effective type count as a side-effect? Or by optimizing away the code, will the compiler also optimize away the choice of effective type?

That is, with the above lvalue access trick, if I now call the above function like this:

int* i = allocate(sizeof(int));
*i = something;

Does this cause strict aliasing violation UB as expected, or is the effective type now int?

Hollo answered 25/6, 2018 at 8:37 Comment(10)
Is it an lvalue? It is an expression and seems to me to be an rvalue. That means you are not typing the variable itself but the result of the expression.Debouch
That sentence seems to be speaking about that specific access only. Nothing about subsequent ones.Ultrafilter
The allocate function you posted has undefined behavior unless type_t is a character type. Specifically, *(type_t)foo may be a trap representation of type_t if type_t is not a character type, and reading it is undefined behavior as per 6.2.6.1 paragraph 5.Celina
@IanAbbott 6.2.6.1/6 "The value of a structure or union object is never a trap representation". Still, most systems don't have trap representations for integers etc (they use 2's complement) and then reading an uninitialized variable is merely unspecified behavior, with some exceptions. See this.Hollo
@Hollo True, in which case there is no undefined behavior if type_t is a struct or union type, or a character type. OP hasn't specified type_t so it may or may not be undefined behavior.Celina
I only just realized who posted the question. :) Also, even if reading the indeterminate value is not a case of undefined behavior, allocate ought to either malloc the maximum of n and sizeof(type_t) or return NULL if n < sizeof(type_t) (or raise a signal, or exit, or whatever). Your choice.Celina
@IanAbbott: The Effective Type rule has nothing to do with trap representations. It is instead intended to identify situations where a compiler must allow for the possibility that pointers might alias even when there is no reason to expect them to do so, but has been interpreted as inviting compilers to break code which uses derived pointers in non-overlapping ways that would be easily identifiable by any compiler that made any bona fide effort whatsoever to handle them.Hoon
@supercat: Thanks, although I haven't mentioned the effective type rule. :)Celina
@IanAbbott: I thought you suggest that trap representations are the reason that the code invokes UB, when there's a different rule that has that effect even for types with no trap representations.Hoon
@supercat: I was suggesting that the code may invoke UB due to it possibly reading a trap representation. It was an "aside" comment about the code presented in the question.Celina
I
8

The phrase from the standard that you are citing clearly only states something about the access to the object. The only changes to the effective type of the object that the standard describes are the two phrases before that, that clearly describe that you have to store into the object with the type that you want to make effective.

6.5/6

If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

Isoprene answered 25/6, 2018 at 8:51 Comment(4)
Ah yeah, I read that section the wrong way - I somehow thought any access was enough to change the effective type. And since writing a lvalue is a side effect, I guess this whole question doesn't make any sense.Hollo
I'll add the relevant part from the standard to your answer for completeness.Hollo
The rule that gcc, clang, and icc actually seem to use is "if a value is stored into an object having no declared or effective type...`. That's not what the Standard says, but so far as I can tell, the Standard as written is simply unworkable. Having the Standard explicitly recognize the ability to recognize operations involving derived pointers/lvalues as a Quality of Implementation issue (as was the case prior to C99) may be a bit hand-wavy, but that's better than pretending that only supports the constructs mandated by the Standard can be a quality general-purpose implementation.Hoon
A less hand-wavy approach would be to say that that a pointer or lvalue derived from another may be used to access the storage associated with the former until the next time: (1) the storage is accessed in conflicting fashion via other means; (2) a pointer is produced via other means which will, sometime in future, be used to access the storage in conflicting fashion; (3) code enters a function, bona fide loop, or bona fide conditional statement wherein one of the above occurs. Adding that to the language would make it usable for systems programming and eliminate the need for Effective Types.Hoon
H
1

Nothing in the Standard would suggest that an operation which writes to an object would only need to be recognized as setting the Effective Type in cases where the operation has other side-effects as well (such as changing the pattern of bits stored in that object). On the other hand, compilers that use aggressive type-based optimization seem unable to recognize a possible change of an object's Effective Type as a side-effect which must be maintained even if the write would have no other observable side-effects.

To understand what the Effective Type rule actually says, I think it's necessary to understand where it came from. So far as I can tell, it appears to be derived from Defect Report #028, more specifically the rationale used to justify the conclusion given therein. The conclusion given is reasonable, but the rationale given is absurd.

Essentially, the basic premise involves the possibility of something like:

void actOnTwoThings(T1 *p1, T2 *p2)
{
  ... code that uses p1 and p2
}
...
...in some other function
  union {T1 v1; T2 v2; } u;
  actOnTwoThings(&u.v1, &u.v2);

Because that act of writing a union as one type and reading as another yields Implementation-Defined behavior, the behavior of writing one union member via pointer and reading another isn't fully defined by the Standard, and should therefore (by the logic of DR #028) be treated as Undefined Behavior. Although the use of p1 and p2 to access the same storage in should in fact be treated as UB in many scenarios like the above, the rationale is totally faulty. Specifying that an action yields implementation-Defined Behavior is very different from saying that it yields Undefined Behavior, especially in cases where the Standard would impose limits on what the Implementation-Defined behavior could be.

A key result of deriving pointer-type rules from the behavior of unions is that behavior is fully and unambiguously defined, with no Implementation-Defined aspects, if code writes a union any number of times using any members, in any sequence, and then reads the last member written. While requiring that implementations allow for this will block some otherwise-useful optimizations, it's pretty clear that the Effective Type rules are written to require such behavior.

A bigger problem that arising from basing type rules on the behavior of unions is that the action of reading a union using one type and writing the union with another type need not be regarded as having any side-effects if the new bit pattern matches the old. Since an implementation would have to define the new bit pattern as representing the value that was written as the new type, it would also have to define the (identical) old bit pattern as representing that same value. Given the function (assume 'long' and 'long long' are the same type):

 long test(long *p1, long long *p2, void *p3)
 {
   if (*p1)
   {
     long long temp;
     *p2 = 1;
     temp = *(long long*)p3;
     *(long*)p3 = temp;
   }
   return *p1;
 }

both gcc and clang will decide that the write via *(long*)p3 can't have any effect since it's simply storing back the same bit pattern that had been read via *(long long*)p3, which would be true if the following read of *p1 were going to be processed in Implementation-Defined behavior in the event the storage was written via *p2, but isn't true if that case is regarded as UB. Unfortunately, since the Standard is inconsistent about whether behavior is Implementation-Defined or Undefined, it's inconsistent about whether the write needs to be regarded as a side-effect.

From a practical perspective, when not using -fno-strict-aliasing, gcc and clang should be regarded as processing a dialect of C where Effective Types, once set, become permanent. They cannot reliably recognize all cases where Effective Types may be changed, and the logic necessary to handle that could easily and efficiently handle many cases which the authors of gcc have long claimed cannot possibly be handled without gutting optimization.

Hoon answered 25/6, 2018 at 17:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.