C/C++ strict aliasing, object lifetime and modern compilers

Asked 6/9, 2013 at 13:48 Answered 21/7, 2015 at 12:52

c++memory compiler-construction strict-aliasing type-punning

I am facing confusion about the C++ strict-aliasing rule and its possible implications. Consider the following code:

int main() {
  int32_t a = 5;
  float* f = (float*)(&a);
  *f = 1.0f;

  int32_t b = a;   // Probably not well-defined?
  float g = *f;    // What about this?
}

Looking at the C++ specs, section 3.10.10, technically none of the given code seems to violate the "aliasing-rules" given there:

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undeﬁned:
... a list of qualified accessor types ...

*f = 1.0f; doesn't break the rules because there is no access to a stored value, i.e. I am just writing to memory through a pointer. I'm not reading from memory or trying to interpret a value here.
The line int32_t b = a; doesn't violate the rules because I am accessing through its original type.
The line float g = *f; doesn't break the rules for just the same reason.

In another thread, member CortAmmon actually makes the same point in a response, and adding that any possible undefined behavior arising through writes to alive objects, as in *f = 1.0f;, would be accounted for by the standard's definition of "object lifetime" (which seem to be trivial for POD types).

HOWEVER: There is plenty of evidence on the internet that above code will produce UB on modern compilers. See here and here for example.
The argumentation in most cases is that the compiler is free to consider &a and f as not aliasing each other and therefore free to reschedule instructions.

The big question now is if such compiler behavior would actually be an "over-interpretation" of the standard.
The only time the standard talks about "aliasing" specifically is in a footnote to 3.10.10 where it makes clear that those are the rules that shall govern aliasing.
As I mentioned earlier, I do not see the any of the above code violating the standard, yet it would be believed illegal by a large number of people (and possibly compiler people).

I would really really appreciate some clarification here.

Small Update:
As member BenVoigt pointed out correctly, int32_t may not align with float on some platforms so the given code may be in violation of the "storage of sufficient alignment and size" rule. I would like to state that int32_t was chosen intentionally to align with float on most platforms and that the assumption for this question is that the types do indeed align.

Small Update #2:
As several members have pointed out, the line int32_t b = a; is probably in violation of the standard, although not with absolute certainty. I agree with that standpoint and, not changing any aspect of the question, ask readers to exclude that line from my statement above that none of the code is in violation of the standard.

Brisk answered 6/9, 2013 at 13:48 Comment(7)

Writing is a form of accessing as much as reading is. – Venola 6/9, 2013 at 13:54

If that was true then the standard would rather say "access the memory of an object through...". But what it does say is "access the stored value", which is not what the code does. – Brisk 6/9, 2013 at 14:2

Among other problems with this code, you never had an object of type float, because you never "obtained storage of sufficient size and correct alignment". a is sized and aligned for int, not float, and some platforms will really let you know (to put it kindly). – Pilau 6/9, 2013 at 14:22

@RafaelSpring You can't replace a value unless you access it. – Encincture 6/9, 2013 at 14:24

@BenVoigt: correct, but I chose the type int32_t intentionally to align with float on most platforms. So in most cases I do have "storage of sufficient size and correct alignment". – Brisk 6/9, 2013 at 14:31

@molbdnilo: Where is that stated? – Brisk 6/9, 2013 at 14:31

@molbdnilo: He's accessing the location, not the prior value. There's no lvalue-to-rvalue conversion occurring. According to your interpretation, writing to any variable for the first time is undefined behavior, because you're accessing an uninitialized value. But a write is not an access to the value. – Pilau 6/9, 2013 at 15:53

You're wrong in your third bullet point (and maybe first one too).

You state "The line float g = *f; doesn't break the rules for just the same reason.", where "just the same reason" (a little vague) seems to refer to "accessing through its original type". But that's not what you're doing. You're accessing an int32_t (named a) through an lvalue of type float (obtained from the expression *f). So you're violating the standard.

I also believe (but less sure on this one) that storing a value is an access to (that) stored value, so even *f = 1.0f; violates the rules.

Maigre answered 6/9, 2013 at 13:53 Comment(16)

The standard states that an object's lifetime ends when its memory is de-allocated or re-used. That is what I am doing by *f = 1.0f;. Therefore the object under consideration is of type float and the line float g = *f; could be considered legal. – Brisk 6/9, 2013 at 13:59

But the "object" in this case is a, not the 5 that happens to be "in" a. So it's still UB. – Skedaddle 6/9, 2013 at 14:2

But a is dead after *f = 1.0f; isn't it? – Brisk 6/9, 2013 at 14:6

@Rafael: The object is still an int. Assigning a different value to it does not change its type, nor does it constitute "reuse" or end its lifetime. Notice that all the examples of reuse in the standard use placement new. – Encincture 6/9, 2013 at 14:19

@molbdnilo: float is a POD, it doesn't require a constructor call to begin existing... but it does require "storage of sufficient size and proper alignment", which this code doesn't provide in a portable way. – Pilau 6/9, 2013 at 14:23

@BenVoigt Yes, but assignment is not reuse and doesn't affect lifetime. int p = 0; new (&p) int(1); is reuse, int p = 0; p = 1; isn't. – Encincture 6/9, 2013 at 14:32

After reading the quote a few times it seems to me that writing is indeed an access as well as reading. – Marbut 6/9, 2013 at 14:40

@molbdnilo: I was actually looking for proper definitions of "re-use" of memory all day yesterday but couldn't find any. Please provide any references if you have any. – Brisk 6/9, 2013 at 14:41

@molbdnilo: You'd be right for non-POD types... but POD types specifically do not require the constructor to run (aka placement new) before their lifetime begins. – Pilau 6/9, 2013 at 15:54

@BenVoigt But assignment still does not affect their lifetime. C++ assings values into objects. It does not assign objects. – Maigre 6/9, 2013 at 15:56

@Angew: According to a literal interpretation of the Standard, an object of type float already exists after the line int32_t a = 5;, assuming the size and alignment requirements are met. Like a union, there are multiple overlapping objects, but only one kind of value is actually stored. And assignment most certainly does change the type of value actually stored (analogous to a union's active member) – Pilau 6/9, 2013 at 15:58

@BenVoigt I don't think you're right. As per [intro.object]§6: "Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses." Your reading would require the int object and float object to reside on the same address. – Maigre 6/9, 2013 at 16:4

@Angew: By your interpretation, unions couldn't work either. You instead need to focus on 3.8p1. If you like, you can consider this to be one object with multiple potential types. – Pilau 6/9, 2013 at 16:6

@BenVoigt Do union member subobjects exist all at the same time, or do they reuse each other's storage when the active one is changed? – Maigre 6/9, 2013 at 16:18

@Angew: If the union member has trivial initialization, then assignment can change it to active. So either "the objects all exist at once" or else "assignment reuses the memory". Take your pick, either one proves molbdnilo's original argument wrong (which you repeated as "But assignment still does not affect their lifetime"). – Pilau 6/9, 2013 at 16:19

@BenVoigt I'm not comfortable choosing one, but I can't find support for my position now. I concede. – Maigre 6/9, 2013 at 16:44

I think this statement is incorrect:

The line int32_t b = a; doesn't violate the rules because I am accessing through its original type.

The object that is stored at location &a is now a float, so you are attempting to access the stored value of a float through an lvalue of the wrong type.

Meldameldoh answered 6/9, 2013 at 14:17 Comment(0)

There are some significant ambiguities in the specification of object lifetime and access, but here are some problems with the code according to my reading of the spec.

float* f = (float*)(&a);

This performs a reinterpret_cast and as long as float does not require stricter alignment than int32_t then you can cast the resulting value back to an int32_t* and you will get the original pointer. Using the result is not otherwise defined in any case.

*f = 1.0f;

Assuming *f aliases with a (and that the storage for an int32_t has the appropriate alignment and size for a float) then the above line ends the lifetime of the int32_t object and places a float object in its place:

The lifetime of an object of type T begins when: storage with the proper alignment and size for type T is obtained, and if the object has non-trivial initialization, its initialization is complete.

The lifetime of an object of type T ends when: [...] the storage which the object occupies is reused or released.

—3.8 Object lifetime [basic.life]/1

We're reusing the storage, but if int32_t has the same size and alignment requirements then it seems like a float always existed in the same place (since the storage was 'obtained'). Perhaps we can avoid this ambiguity by changing this line to new (f) float {1.0f};, so we know that the float object has a lifetime that began at or before the completion of the initialization.

Additionally, 'access' does not necessarily just mean 'read'. It can mean both reads and writes. So the write performed by *f = 1.0f; could be considered 'accessing the stored value' by writing over it, in which case this is also an aliasing violation.

So now assuming that a float object exists and the int32_t object's lifetime has ended:

int32_t b = a;

This code accesses the stored value of a float object through a glvalue with type int32_t, and is clearly an aliasing violation. The program has undefined behavior under 3.10/10.

float g = *f;

Assuming that int32_t has the right alignment and size requirements, and that the pointer f has been obtained in a way that allows its use to be well defined, then this should legally access the float object that was initialized with 1.0f.

Rumba answered 6/9, 2013 at 19:12 Comment(1)

Thank you, getting opinions on this matter really helpful. I agree that int32_t b = a; is probably an alias violation although I may have stated otherwise in the question. As for the remaining ambiguities, is there any way to contact the people who wrote the standard and ask for clarification? – Brisk 6/9, 2013 at 20:8

I've learned the hard way that quoting 6.5.7 from the C99 standard is unhelpful without also looking at 6.5.6. See this answer for the relevant quotes.

6.5.6 makes it clear that the type of an object can, under certain circumstances, change many times during its lifetime. It can take on the type of the value that was most recently written to it. This is really useful.

We need to draw a distinction between "declared type" and "effective type". A local variable, or static global, has a declared type. You are stuck with that type, I think, for the lifetime of that object. You may read from the object using a char *, but the "effective type" doesn't change unfortunately.

But the memory returned by malloc has "no declared type". This will remain true until it is freed. It will never have a declared type, but it's effective type can change according to 6.5.6, always taking on the type of the most recent write.

So, this is legal:

int main() {
    void * vp = malloc(sizeof(int)+sizeof(float)); // it's big enough,
                    //  and malloc will look after alignment for us.
    int32_t *ap = vp;
    *ap = 5;      // make int32_t the 'effective type'
    float* f = vp;
    *f = 1.0f;    // this (legally) changes the effective type.

    // int32_t b = *ap;   // Not defined, because the
                          // effective type is wrong
    float g = *f;    // OK, because the effective type is (currently) correct.
}

So, basically, writing to a malloc-ed space is a valid way to change its type. But I guess that doesn't give us a way to look at the pre-existing through the "lens" of a new type, which might be interesting; it's impossible unless, I think, we use the various char* exceptions to peek at data of the "wrong" type.

Intercollegiate answered 21/7, 2015 at 12:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags