What is the "right" way to avoid Aliasing (e.g. when adding an element of a container to itself) in C++?
Asked Answered
H

5

11
std::vector<int> a;
a.push_back(1);
a.push_back(a[0]);

I just learned that the code above can be very dangerous.

(If it's not obvious why, you're not alone... it wasn't obvious to me either.)

My questions:

  1. What is the "standard" way of dealing with it? Making a new variable and then assigning it immediately to something afterward seems a bit weird to me. Is there a better way of dealing with it?

  2. How do you train yourself to watch out for aliasing issues like this? What pattern(s) do you look for? I have no idea to recognize this situation; I only learned about aliasing when I learned about the restrict keyword in C, and only now do I understand what the issue really is.


Edit:

I'd love to accept an answer, but it doesn't seem like part (2) of the question has been answered. I'm wondering what strategies people use to locate aliasing mistakes in code they have written.

One strategy I've come up with so far is to avoid passing in the same value for in two parameters. (In this case, one parameter is implicit and one explicit.)

Are there any other easy things to notice and watch out for?

Haileyhailfellowwellmet answered 2/6, 2011 at 5:13 Comment(8)
This is guaranteed to work in C++03, in g++ when the vector grows, the first element that is copied is the new one, so that if that throws an exception the original container is left untouched (12.1/10).Tractarianism
@DavidRodríguez-dribeas How does that prevent invalidation of the reference to a[0]?Homespun
@sehe: On the order of the implementation: any reference to a[0] will be invalidated during the operation, but it will be invalidated after the element is copied, which means that at the time of access to a[0] (as parameter to push_back) that object is still alive. Additionally the exception guarantees require that all elements are copied to the new location (including the newly inserted) before freeing the old buffer, as the container cannot guarantee that copy construction will not throw, and if it throws the vector must be left in the original state.Tractarianism
@DavidRodríguez-dribeas: Doesn't the mere existence of an invalid reference/pointer imply UB in C++, regardless of if it's dereferencedd?Haileyhailfellowwellmet
@Mehrdad: Then every program would be invalid, since after delete the pointer is invalid but it still exists. The UB is not that on the existence of the invalid reference/pointer, but rather in accessing it.Tractarianism
@DavidRodríguez-dribeas: That's a great point, but what about questions like this one? (Perhaps after you use delete, you may not access the pointer anyway, so theoretically it could be zero'd by delete and hence made safe?) I've seen this question pop up more than once... and when I expressed my surprise someone told me I'm indeed wrong, and that the only valid pointers are those that point to an array or 1 element past.Haileyhailfellowwellmet
@DavidRodríguez-dribeas: Another example... ah, here is the one I was referring to.Haileyhailfellowwellmet
@Mehrdad: I'll try to read those when I get more time, but in the mean time read 3.8/6 of the standard.Tractarianism
S
5

EDIT: Technically, the standard does not mandate that this is correct if the contained type has a no-throw copy constructor. I don't know any implementation where this does not hold anyway, as it would require producing two implementations of push_back when the generic one is just as efficient in all cases.

Aliasing is a problem in general, but not in this particular case. The code:

assert( v.size() > 0 );
v.push_back( v[0] );

Is guaranteed to be correct by the standard (C++03) through the exception guarantees (which are a really good reason not to implement your own containers, you will probably not get them right). In particular §23.1 [lib.container.requirements] / 10 dictattes:

Unless otherwise specified (see 23.2.1.3 and 23.2.4.3) [NOTE: both those references refer to insert on deque and vector respectively] all container types defined in this clause meet the following additional requirements:

— if an exception is thrown by a push_back() or push_front() function, that function has no effects.

Where the important bit is that if any exception is thrown in the operation, the container is left untouched, and that means that no iterator gets invalidated, which in turns means that the original region of memory is left untouched until it is guaranteed that no exceptions will be thrown (with the exception pun intended, of destructors). Because in general copy constructors can throw, the implementation must ensure that all copies are performed before destroying any object.

This becomes more evident in C++0x, when objects are not copied from one location to another, but rather moved. Because the copy of the new element might throw, it has to be performed before any of the moves are executed, or else you would be left in a situation where some of the objects in the original container have been invalidated.

Spore answered 2/6, 2011 at 8:29 Comment(8)
@Johannes raises a good point in his comments, that the library is allowed to determine (via type traits), that int has a non-throwing constructor, so the exception guarantee is trivially met and places no restrictions on the implementation of push_back.Hubble
@Ben Voigt: That is a good and valid point, but it would mean that the implementation would have to provide two different implementations just to ensure that it can mess up your code if the type does not throw... so from a practical standpoint, I don't see any implementor doing it :) A similar slightly worse situation occurs in C++0x with types that implement move semantics, as moving the elements is a destructive operation: if a move constructor throws, you are basically screwed with a container where some of the elements have been moved and others haven't, kind of like with destructors.Tractarianism
@DavidRodríguez-dribeas: Either you look at what the standard says or you look at what you think is reasonable. The standard requires that in case of an exception nothing happens, this however doesn't imply that destruction must be done after copy-construction of the passed object. Once it's known that copy constructing the element won't throw (e.g. because is an int) old storage destruction can be anticipated. Either you think Johannes is wrong or if he's right then a compliant implementation may have undefined behavior while doing v.push_back(v[0]); and that's not safe C++ code.Erichericha
@AlfP.Steinbach: Do you care to explain what is incorrect? There is a corner case I did not considered (pointed out by 6502) by which the implementer could potentially reorder the code after verifying that copy construction will not throw, and I have to agree that technically that is the case --whether the implementer would want to invest in providing a different solution for non-throw constructors, adding code unnecessarily when the general solution works is a different point. But your comment all incorrect seems to indicate that you have greater concerns than that, do you?Tractarianism
@DavidRodríguez-dribeas: Also after discussing this in the chat Alf P. Steinbach brought another important point (chat.stackoverflow.com/transcript/message/1594676#1594676). If you pass a function a reference to an object that doesn't live long enough then you're already in UB land and the C++ implementation is not required to behave in any prescribed way. v[0] is not going to live long enough in case of reallocation and therefore the implementation is free to do whatever it wants to do.Erichericha
@6502: You are in UB when using a reference to an object that is no longer alive, but if you only use that reference when the object is alive, then there is no UBTractarianism
@DavidRodríguez-dribeas: Since it's possible that in a compliant implementation of std::vector<int>::push_back deallocation is done before copying then it's possible that in a compliant implementation v.push_back(v[0]) will be UB. Therefore v.push_back(v[0]) is not safe in C++. It may be safe in a specific implementation that takes care of it and where this is documented by the vendor, but it's not safe in general for the C++ language. Code relying on that working is at least non portable.Erichericha
@David: as I mentioned in the chat, I (sorry) did not fully grok your position before reacting. however, my gut feeling is often correct, and in this case it turned out to be also formally correct for c++11. the key for the formal pov is the phrase "Unless otherwise specified", which lists the discussion of std::vector::push_back as one of those otherwise specified areas.Nubilous
H
4

I guess this would be safe:

std::vector<int> a(1);
a.push_back(1);
a.push_back(int(a[0]));
Hubble answered 2/6, 2011 at 5:18 Comment(5)
@James: Of course, it runs afoul of a VC++ bug where an identity cast doesn't create a prvalue temporary.Hubble
That's actually something I'd thought of, but I wasn't sure if it created a temporary variable or not... it seems like a pretty "obvious" optimization for the compiler to make (mistakenly).Haileyhailfellowwellmet
The standard guarantees through the exception guarantees in the container that a.push_back( a[0] ); is safe, and it is more straightforward than this approach. While it is dangerous in a hand crafted container, it is safe in a standards conforming implementation.Tractarianism
This is an ideal way for employing the unary-+ operator. a.push_back(+a[0])Outstand
@Johannes No, that’s just being clever and obscuring semantics without any benefit.Elfriedeelfstan
F
0

In push_back(const T& el); implementation to check if el is inside array or other internal storage. That is the only politically correct way of dealing with such problems.

Container should handle this as different containers - different safety rules.

Flocky answered 2/6, 2011 at 5:20 Comment(5)
Most straightforward solution would be for push_back to do the following: allocate new storage, <strike>copy</strike> move existing elements, append new element, destroy old storage.Hubble
@Ben But in this case, the new element will then refer to a moved from object.Verboten
@Luc Danton: The correct sequence (and how it is implemented in g++) is: allocate, copy construct new element, copy/move all other elements. The reason is that the container must be left untouched as per 12.1/10 if an exception is thrown, and copy construction can throw.Tractarianism
@Ben Voigt: That is almost the correct approach, but a conforming compiler must first copy construct the new element (if it is to use move semantics), as it must guarantee that if that throws, the container is left untouched. §12.1/10 second point. In a C++03 compiler (no move semantics) your description is correct. Some compilers implement under the hood move semantics when growing (Dinkumware will swap the contained vectors in a vector<vector<T> > to avoid the cost of copying actually implementing move in C++03 through SFINAE, for example. Those compilers must still be safe.Tractarianism
@Luc: You're right, the correct sequence is as described by @David.Hubble
N
0

This probably isn't a useful answer for you, but IMHO the "right" way is that the container class should handle aliasing correctly, so that the caller doesn't have to worry about it. In particular, push_back() (or equivalent) should do the following:

// C++-ish pseudo-code, exception-safety left as an exercise for the reader
void push_back(const T & t)
{
   if (current_size == alloced_size)
   {
      // Oops, our data array is full.  Time to trade it in for a bigger one
      T * newArray = new T[alloced_size*2];
      copy_items(newArray, current_array, current_size);
      newArray[current_size++] = t;
      delete [] current_array;    // delete old array only AFTER all references to t
      current_array = new_array;
      alloced_size *= 2;
   }
   else current_array[current_size++] = t;
}
Novgorod answered 3/6, 2011 at 1:45 Comment(3)
Actually, it is useful, but just not complete. I totally agree that the callee needs to take care of this, rather than the caller. My question is, how do you notice a bug like this in your own code? What do you look for?Haileyhailfellowwellmet
I think what your question really boils down to is, how does one notice a bug (or at least, an undocumented gotcha) in library code? The answer is, unless you are willing to manually read through the STL codebase until you understand it all, there simply isn't a practical way to know -- you mostly just have to trust the libraries to be bug-free. What you can do is run your program under valgrind (or Purify, or similar) and see if they report any memory errors at runtime, though. Tools like this will often point out bugs whose symptoms would otherwise be quite subtle and mysterious.Novgorod
I'm not referring to library code in the second question -- I'm asking how I would notice such a bug in my own code. Interesting point about Valgrind, thanks.Haileyhailfellowwellmet
P
-2

I'm just winging this, so please don't consider it gospel, but would this work?

a.push_back(1);
a.push_back(&(new int(a[0])));
Pronator answered 2/6, 2011 at 5:18 Comment(2)
If you want a memory leak, sure! (Edit: Actually, no, it doesn't work anyway. You need to use * instead of &.)Haileyhailfellowwellmet
Which is why I specifically stated, "I'm just winging this, so please don't consider it gospel..." It was merely intended as an idea to try and test.Pronator

© 2022 - 2024 — McMap. All rights reserved.