Are non dereferenced iterators past the "one past-the-end" iterator of an array undefined behavior?
Asked Answered
B

4

16

Given int foo[] = {0, 1, 2, 3}; I want to know if iterators that point past the "one past-the-end" are invalid. For example: auto bar = cend(foo) + 1;

There are a ton of complaints and warnings that this is "undefined behavior" in Stack Overflow questions like this: c++ what's the result of iterator + integer when past-end-iterator? Unfortunately the only source is hand waving.

I'm having more and more trouble buying that, for example:

int* bar;

Is uninitialized, but certainly does not invoke undefined behavior, and given enough tries I'm sure I could find an instance where the value in this uninitialized bar had the same value as cend(foo) + 1.

One of the big confusions here is that I am not asking about dereferencing cend(foo) + 1. I know that would be undefined behavior and the standard forbids it. But answers like this: https://mcmap.net/q/41431/-are-end-1-iterators-for-std-string-allowed which cite only that dereferencing such an iterator is illegal do not answer the question.

I also know that C++ only guarantees that cend(foo) will be valid, but it could be numeric_limits<int*>::max(), in which case cend(foo) + 1 would overflow. I'm not interested in that case unless it is called out in the standard as the reason we can't have an iterator past the "one past-the-end". I know that int* really just holds an integer value, and as such is subject to overflow.

I would like a citation from a credible source that moving an iterator beyond the "one past-the-end" element is undefined behavior.

Bernardinabernardine answered 13/5, 2016 at 12:9 Comment(11)
@juanchopanza I've updated the question to say "one past-the-end" everywhere for clarity.Bernardinabernardine
@JonathanMee That's better, thanks!Cavy
"I know that int* really just holds an integer value" I worked on a proprietary system once where that wasn't the case. Pointers also had the pointer type embedded in them.Question
@JonathanMee: No, it would be impossible to make a confirming C or C++ compiler for such a system. But there have historically been systems where a pointer is more than just an integer. Even on modern x64 systems pointers don't always act like integers.Question
This question ought to have a language-lawyer tag, but I'll let you decide which to remove to fit it.Trichite
@BenVoigt I agree, this is one of the few cases that I'd say this question could utilize more than 5 tags. I didn't put it there initially cause I wanted the [undefined behavior] tag, but you're right, [language lawyer] is probably more relevant to what I'm asking.Bernardinabernardine
There is a corresponding question about the STL container iterators.Ecto
@Ecto Yeah, a bunch of these should probably be closed as duplicates. Though I'd like to think mine worth saving as the answers here do a great job providing standard citations.Bernardinabernardine
This question is about arrays, not STL containers, and so is not a duplicate of the related question about containers.Ecto
doesn't this well established idiom: while(*s++ = *t++) also computer a pointer past the end(although it never dereferences it)Arran
@Arran You define that as a "well established idiom", though I've never seen it in the wild. I'm assuming that's for copying c-strings or some other null terminated array, in which case the null terminator is an allocated element, and the element both s and t point to is the "one past the end" which is a defined element to point to. This question is about the one past the "one past the end pointer, or you might say "two past the end".Bernardinabernardine
B
27

Yes, your program has undefined behaviour if you form such a pointer.

That's because the only way you can do so is to increment a valid pointer past the bounds of the object it points inside, and that is an undefined operation.

[C++14: 5.7/5]: When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

An uninitialised pointer is not the same thing because you never did anything to "get" that pointer, other than declaring it (which is obviously valid). But you can't even evaluate it (not dereference — evaluate) without imbuing your program with undefined behaviour. Not until you've assigned it a valid value.

As a sidenote, I would not call these "past-the-end" iterators/pointers, a term in C++ which specifically means the "one past-the-end" iterator/pointer, which is valid (e.g. cend(foo) itself). You're waaaay past the end. ;)

Bittersweet answered 13/5, 2016 at 12:12 Comment(18)
You could get the same for actual iterators instead of just pointers by looking at their operations. ++it, for some InputIterator it, has a precondition that it is dereferenceable.Disaccharide
@Disaccharide Yep. CBA to be honest (fish & chips just arrived! nommm) went with pointers which is what the OP gave as example, despite terminology to the contrary.Bittersweet
@Disaccharide That's not true, because the "one past-the-end" iterator is not dereferencable, yet we routinely use when it == the "one past-the-end" iterator as a termination condition for loops.Bernardinabernardine
What if the last element of a valid array ends at numeric_limits<void*>::max()? The standard says "shall not produce an overflow;" however, "one past the end" would necessarily overflow a pointer.Daph
@Daph in this case the implementation has to make sure that an array can not end at numeric_limits<void*>::max().Offer
@Daph The standard also guarantees that a pointer to "one past-the-end" will always be valid. Therefore an implementation is not allowed to allocate an array such that iteration up to and including the "one past-the-end" element would cause overflow.Bernardinabernardine
@JonathanMee That's not true, because the "one past-the-end" iterator is not dereferencable but the iterator pointing to the last element is, so it holds the precondition and increments to the "one-past-the-end".Offer
@Offer That absolutely is true. It's called out throughout the standard that "one past-the-end" is a valid iterator/pointer value (not to dereference of course, just that the address will be valid.) You can even see it in Lightness Races in Orbit's quote of [5.7.5]: "...or one past the last element of the array object, the evaluation shall not produce an overflow"Bernardinabernardine
Let's say you're writing kernel code, and you have pointers derived from integer constants, representing physical memory locations, guaranteed to exist. Let's say you have an array of four 32-bit values, starting in a 32-bit address space, at 0xfffffff0. Do you waste the last four bytes of physical address space, and simply disallow its use, simply to satisfy the letter of the standard?Daph
@JonathanMee I don't get the point you're making. What chris is talking about is whether incrementing iterators is possible, not whether the one-past-the-end iterator is valid or not. And what he said is that you can't increment a iterator if it is not dereferencable, and the "one-past-the-end"-iterator is not dereferencable, therefore you can get to the "one-past-the-end"-iterator (because the "end"-iterator is dereferencable") but can't increment any further.Offer
@Offer We might be saying the same thing? When I say: "The standard also guarantees that a pointer to "one past-the-end" will always be valid." I'm saying that it is valid for a pointer to have that address, not that it can be dereferenced. You agree with that, right?Bernardinabernardine
@Elkvis: Yes, you do.Bittersweet
I agree with that, yes. But a Iterator to "one past-the-end" is not incrementable, and thats what chris said.Offer
@Offer I think we're all in agreement, including chris :J I just can't figure out why you said: "That's not true, because the "one past-the-end" iterator is not dereferencable"Bernardinabernardine
@JonathanMee: Erm, you're the one who said that... (tkausl later quoted you, and pointed out that you were wrong — and this level of confusion is why we shouldn't be using code formatting for quotations, even in comments...).Bittersweet
@Offer Thanks to Lightness Races in Orbit I see that it was my misunderstanding that caused all this. I'm sorry.Bernardinabernardine
@Disaccharide That specification is somewhat defective (forward_list::before_begin() says hi), but meh.Twopiece
@Twopiece Interesting, it doesn't surprise me that you knew of that obscure example :p Nice precondition weakening right there unless I missed something in my quick look through before. Also, I missed a heck of a discussion by going to work...Disaccharide
T
6

TL;DR -- It is undefined behavior to compute an iterator past the one-past-the-end iterator because a precondition is violated in the process.


Lightness provided the quote that authoritatively covers pointers.

For iterators, incrementing past the "end" (one-past-the-last-element) is not prohibited generally, but it IS prohibited for most of the various kinds of iterators:

Iterator requirements

InputIterator requirements

The input iterator requirements, and the only incrementable if dereferenceable clause in particular, are incorporated by reference into forward, bidirectional, and random-access iterators.

Output iterators are not so constrained, they are always incrementable. Because there is no end, iterators past-the-one-past-the-end are excluded by definition, so worrying about whether they would be legal to compute is moot.

Then, jumping forward in the sequence is defined in terms of individual incrementation, so we conclude that computation of a past-the-one-past-the-end iterator is either meaningless or illegal for all iterator types.

RandomAccessIterator requirements

Trichite answered 13/5, 2016 at 21:57 Comment(3)
Excellent observation. I have looked at those iterator tables many times, and somehow always missed the pre and post conditions.Bernardinabernardine
Hmmm... I feel this post deserves more upvotes than just mine, as it is the other half of the answer that @LightnessRacesInOrbit did not complete. I've linked another question in. If you think it would be helpful I can move the question back to unaccepted for a bit?Bernardinabernardine
I've dug myself into a most huge argument with @NathanOliver over this post. Since I know you're always up for a good argument I thought I'd link it: https://mcmap.net/q/41434/-std-next-with-n-gt-std-distance-it-c-endBernardinabernardine
O
0

I'm not interested in that case unless it is called out in the standard as the reason we can't have an iterator past the "one past-the-end". I know that int* really just holds an integer value, and as such is subject to overflow.

The standard doesn't discuss reasons for making things undefined. You've got the logic backwards: The fact that it's undefined is the reason that an implementation may put an object in a location where doing such a thing would otherwise cause an overflow. If a "two-past-the-end" iterator were required to be valid, then implementations would be required not to put an object somewhere that might cause such an operation to overflow.

Outcrop answered 13/5, 2016 at 17:17 Comment(12)
I think you've misunderstood my question. Just because the standard does not explicitly call out int x = 13 + 42 as defined doesn't mean it's undefined. The standard does establish the numeric_limits<int> class so that I can determine the range over which an int is defined. I'm looking for something similar here. From my example, I'd like to know what the allowable range for auto bar = cend(foo) is. Obviously there are implementation limits, as bar can only contain an address that is outside it's integral max. I don't care about that implementation detail.Bernardinabernardine
The standard does not actually describe pointer types as being a flat linear space with a minimum and a maximum and everything between being valid, as you seem to assume they are.Outcrop
I mean I could go with you if you'd said iterator types, for example regex_token_iterator or ostream_iterator are definitely not laid out linearly. But you said pointer. I'm guessing that you're trying to call out that memory is allocated in pages, which doesn't make any difference if I don't dereference the pointer. The "one past-the-end" pointer established by C++ is allowed to be an address to an unallocated page, a protected page, or whatever it feels like, cause it's never dereferenced. I was asking if the next pointer behaved similarly.Bernardinabernardine
"which doesn't make any difference if I don't dereference the pointer. " who says? An architecture certainly could have exceptions, signals, or whatever if a pointer variable is set to an unallocated address (such an architecture couldn't have one-past-the-end be one of these obviously). Or it could have a segment-offset format, where overflowing the offset causes a crash rather than having it go to the next segment.Outcrop
Again you're thinking of dereferencing which is actually concerned with the layout of memory. A pointer is just an integral value, I can compare it to another value in the same array object and that will always be defined over the range of the array and the "one past-the-end" pointer. Even if that "one past-the-end" pointer wouldn't be valid because of a segment offset or anything else, because I do not check with the memory layout to make a comparison. I'm just comparing integral values.Bernardinabernardine
@jona no, pointers are not integral values. That is one implementation of pointers (and by far the most common), but the standard makes no such guarantee.Gocart
@Yakk I think that we agree that pointers are implemented as integers, and I think that we agree that over the range of an array and it's "one past-the-end" pointer the pointers are contiguous integers. My intention with this question was to ask if past that range the integers were still contiguous (they are not guaranteed to be.) So I think that we're in agreement on everything?Bernardinabernardine
@JonathanMee For one hypothetical example, a pointer could be implemented as two integers, and overflowing the less-significant one (i.e. the "offset" in a segment-offset system) could cause an immediate overflow_error instead of (as you assume) carrying to the next value of the more-significant one. An implementation is free to make these choices.Outcrop
@jon We do not agree that pointers must be implemented as integers.Gocart
@Outcrop I don't believe it. ptrdiff_t "is the signed integer type of the result of subtracting two pointers." under the system you describe it would not be possible to guarantee ptrdiff_t for all sizes and locations of array. Or we could say that an array would never be allocated across that divide in which case it would be no different from the segments we've talked about, which an array must exist within one of in it's entirety.Bernardinabernardine
"Or we could say that an array would never be allocated across that divide in which case it would be no different from the segments we've talked about, which an array must exist within one of in it's entirety. " - sure. now what?Outcrop
@Yakk Well guys, as long as a pointer acts as an contiguous integer over the array for it's entire range and the "one past-the-end" pointer, I think you're right, it could be implemented other ways. I expect the debate over whether this is allowable is so important here, because for an iterator past the "one past-the-end" pointer, it may have stepped over that segmentation boundary. It's an interesting concept, but it's only made possible by the limitations the standard has put in place on the ranges for which an iterator is valid.Bernardinabernardine
G
0

As @Random842 said so well:

The standard does not describe pointer types as being in a flat linear space with a minimum and a maximum and everything between being valid, as you seem to assume they are

Pointers are not presumed to exist in a flat linear space. Instead, there are valid pointers, and invalid pointers. Some operations on pointers are defined, others are undefined behavior.

On many modern systems, pointers are implemented in a flat linear space. Even on these systems, the undefinedness of forming some pointers can open your C++ compiler to some optimizations; for example, int foo[5]; bool test(int* it1) { int* it2 = cend(foo); return it1 <= it2; } can be optimized to true as there are no pointers that can be validly compared to it2 that are not less than or equal to it.

In less contrived situations (like some loops) this could save cycles on every loop.

It is unlikely that the pointer model was developed with this in mind. There are pointer implementations that are not in a flat linear space.

Segmented memory is the most well known. In old x86 systems, each pointer is a pair of 16 bit values. The location they refer to in the linear 20 bit address space is high << 4 + low, or segment << 4 + offset.

Objects live within a segment, and have a constant segment value. This means all defined pointer < comparisons can simply compare offset, the low 16 bits. They don't have to do that math (which, at the time was expensive), they can discard the high 16 bits and compare the offset values when ordering.

There are other architectures, where code exists on a parallel address space to data (so comparing code pointers to data pointers can return spurious equality).

The rules are pretty simple. Can create pointers to elements in arrays, and to the one-past-the-end (this means that the segmented memory system cannot build arrays that reach the very end of the segment).

Now, your memory isn't segmented, so this isn't your problem, right? The compiler is free to interpret your forming of ptr+2 along a certain code branch to mean that ptr is not a pointer to the last element of an array, and optimize accordingly. If that isn't true, your code could behave in unexpected ways.

And there are instances of real compilers in the wild using techniques like that (assuming code does not use undefined behavior, proving invariants from it, using conclusions to change behavior before the undefined behavior occurs), if not that particular case. Undefined behavior can time travel, even if the underlying hardware implementation "would have no problem with it" without any optimizations.

Gocart answered 13/5, 2016 at 18:26 Comment(5)
We know that an array must be allocated linearly to allow pointer arithmetic even with segmented memory. And we know that even if the the "one past-the-end" pointer isn't in the same segment or isn't even a valid address at all, that pointer arithmetic is still defined as far as that endpoint, just dereferencing isn't. I wanted to know if the address past the "one past-the-end" pointer was still valid for comparison against other array addresses. Thankfully @Lightness Races In Orbit has put forward a standard citation that answers the question.Bernardinabernardine
@JonathanMee What if the overflow of the 16 bit component caused a trap? The only portion that must be linear is within the array and one past the end -- there is no guarantee of linearity over all pointer values, which your original post seems to imply you believe in. As do your comments. There is no guarantee of overall linear memory in C++*. There is only that guarantee in certain situations.Gocart
Your overflow statement is fair. But that is also why I said in my question that I wasn't interested in the numerical overflow case. I'm only interested in non-dereferenced values that are compared as integers internally to the program. I do like the example function in your question, because it sheds light onto what kind of optimizations are made available to the compiler by the standard expressing that values past "one past-the-end" exhibit undefined behavior. But at the end of the day without the standards requirement a compiler that optimized in that way wouldn't be compliant.Bernardinabernardine
@JonathanMee The point is that there's absolutely nothing guaranteeing that invalid pointers are safe to use so long as they're "non-dereferenced", which is an assumption you are making here. The fact that the standard does not make this guarantee is what allows it to be undefined. You are essentially asking people to prove a negative, that the standard does not have anything that does make it defined, based on a fundamental misunderstanding of what undefined behavior is.Outcrop
@Outcrop I understand your point. I didn't understand it when I asked the question, but I can see how since pointers are only valid across their allocated ranges and the "one past-the-end" pointer that an implementation could be designed this way.Bernardinabernardine

© 2022 - 2024 — McMap. All rights reserved.