Why are Standard iterator ranges [begin, end) instead of [begin, end]?
Asked Answered
E

7

218

Why does the Standard define end() as one past the end, instead of at the actual end?

Ellaelladine answered 1/4, 2012 at 9:40 Comment(9)
I'm guessing "because that's what the standard says" won't cut it, right? :)Sisyphus
@LuchianGrigore: Of course not. That would erode our respect for the (people behind) the standard. We should expect that there's a reason for the choices made by the standard.Wordbook
I guess, this explanation also deserves your attention: One Past the EndKrenek
In short, computers don't count like people. But if you're curious as to why people don't count like computers, I recommend The Nothing that Is: A Natural History of Zero for an in-depth look at the trouble humans had discovering that there is a number which is one less than one.Resilience
Because there's only one way to generate "last one", it is often not cheap because it has to be real. Generating "you fell off the end of the cliff" is always cheap, many possible representations will do. (void*)"ahhhhhhh" will do fine.Klan
This is one of those rare questions where it is appropriate to talk about "the STL". The [begin, end) convention comes from the original HP/SGI version of the STL.Encephalograph
@MSalters: I believe that it's present in the C Standard, too, where it is legal to take the address of one-past-the-end of an array, which is AFAIK older than the STL and may have origins even earlier.Ellaelladine
You know, I asked this question in a different form and got a very poor showing of results. It's all in how you ask I guess. #8442249Brockman
Because you want to be able to specify the empty sequence like [a,a) and not [a,a-1]. The later is just plain ugly.Petrie
W
300

The best argument easily is the one made by Dijkstra himself:

  • You want the size of the range to be a simple difference end − begin;

  • including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

Wordbook answered 1/4, 2012 at 9:45 Comment(5)
The typical C for loop iterating over an array of size N is "for(i=0;i<N;i++) a[i]=0;". Now, you can't express that directly with iterators - many folks wasted time trying to make < meaningful. But it is almost equally obvious to say "for(i=0;i!=N;i++)..." Mapping 0 to begin and N to end is therefore convenient.Chlorpromazine
@KrazyGlew: I didn't put types in my loop example deliberately. If you think of begin and end as ints with values 0 and N, respectively, it fits perfectly. Arguably, it's the != condition that's more natural than the traditional <, but we never discovered that until we started thinking about more general collections.Wordbook
@KerrekSB: I agree that "we never dscovered that [!= is better] until we started thinking about more general collections." IMHO that is one of the things Stepanov deserves credit for - speaking as someone who tried to write such template libraries before the STL. However, I'll argue about "!=" being more natural - or, rather, I'll argue that != has probably introduced bugs, that < would catch. Think for(i=0;i!=100;i+=3)...Chlorpromazine
@KrazyGlew: Your last point is somewhat off-topic, since the sequence {0, 3, 6, ..., 99} is not of the form that the OP asked about. If you wanted it to be thus, you should write a ++-incrementable iterator template step_by<3>, which would then have the originally advertised semantics.Wordbook
@KrazyGlew Even if < would sometime hide a bug, it is a bug anyway. If someone use != when he should use <, then it is a bug. By the way, that king of error is easy to find with unit testing or assertions.Cayuga
C
89

Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Obviously begin points to the beginning of the sequence, and end points to the end of the same sequence. Dereferencing begin accesses the element A, and dereferencing end makes no sense because there's no element right to it. Also, adding an iterator i in the middle gives

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

and you immediately see that the range of elements from begin to i contains the elements A and B while the range of elements from i to end contains the elements C and D. Dereferencing i gives the element right of it, that is the first element of the second sequence.

Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i (which I've named ri) still points in between elements B and C. However due to reversing the sequence, now element B is on the right to it.

Cyanic answered 2/4, 2012 at 9:18 Comment(4)
This is IMHO the best answer, though I think it might be better illustrated if the iterators pointed at numbers, and the elements were between the numbers (the syntax foo[i]) is a shorthand for the item immediately after position i). Thinking about it, I wonder if it might be useful for a language to have separate operators for "item immediately after position i" and "item immediately before position i", since a lot of algorithms work with pairs of adjacent items, and saying "The items on either side of position i" may be cleaner than "The items at positions i and i+1".Easterner
@supercat: The numbers were not supposed to indicate iterator positions/indices, but to indicate the elements themselves. I'll replace the numbers with letters to make that clearer. Indeed, with the numbers as given, begin[0] (assuming a random access iterator) would access element 1, since there's no element 0 in my example sequence.Cyanic
Why is the word "begin" used rather than "start"? After all, "begin" is a verb.Horrified
@Horrified I think "begin" is meant to be the abbreviation of "beginning" (which now makes sense). "beginning" being too long, "begin" sounds like a nice fit. "start" would be conflicting with the verb "start" (for example when you have to define a function start() in your class for starting a specific process or whatever, it would be annoying if it conflicts with an already existing one).Superheterodyne
A
75

Why does the Standard define end() as one past the end, instead of at the actual end?

Because:

  1. It avoids special handling for empty ranges. For empty ranges, begin() is equal to end() &
  2. It makes the end criterion simple for loops that iterate over the elements: The loops simply continue as long as end() is not reached.
Afire answered 1/4, 2012 at 9:42 Comment(0)
A
65

Because then

size() == end() - begin()   // For iterators for whom subtraction is valid

and you won't have to do awkward things like

// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }

and you won't accidentally write erroneous code like

bool empty() { return begin() == end() - 1; }    // a typo from the first version
                                                 // of this post
                                                 // (see, it really is confusing)

bool empty() { return end() - begin() == -1; }   // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators

Also: What would find() return if end() pointed to a valid element?
Do you really want another member called invalid() which returns an invalid iterator?!
Two iterators is already painful enough...

Oh, and see this related post.


Also:

If the end was before the last element, how would you insert() at the true end?!

Absolution answered 1/4, 2012 at 9:44 Comment(0)
C
24

The iterator idiom of half-closed ranges [begin(), end()) is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.

void func(int* array, size_t size)

Converting to half-closed ranges [begin, end) is very simple when you have that information:

int* begin;
int* end = array + size;

for (int* it = begin; it < end; ++it) { ... }

To work with fully-closed ranges, it's harder:

int* begin;
int* end = array + size - 1;

for (int* it = begin; it <= end; ++it) { ... }

Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it's much easier to call std::find(array, array + size, some_value) than it is to call std::find(array, array + size - 1, some_value).


Plus, if you work with half-closed ranges, you can use the != operator to check for the end condition, becuase (if your operators are defined correctly) < implies !=.

for (int* it = begin; it != end; ++ it) { ... }

However there's no easy way to do this with fully-closed ranges. You're stuck with <=.

The only kind of iterator that supports < and > operations in C++ are random-access iterators. If you had to write a <= operator for every iterator class in C++, you'd have to make all of your iterators fully comparable, and you'd fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list, or the input iterators that operate on iostreams) if C++ used fully-closed ranges.

Centri answered 1/4, 2012 at 11:37 Comment(0)
P
9

With the end() pointing one past the end, it is easy to iterate a collection with a for loop:

for (iterator it = collection.begin(); it != collection.end(); it++)
{
    DoStuff(*it);
}

With end() pointing to the last element, a loop would be more complex:

iterator it = collection.begin();
while (!collection.empty())
{
    DoStuff(*it);

    if (it == collection.end())
        break;

    it++;
}
Parmesan answered 1/4, 2012 at 9:47 Comment(0)
R
0
  1. If a container is empty, begin() == end().
  2. C++ Programmers tend to use != instead of < (less than) in loop conditions, therefore having end() pointing to a position one off-the-end is convenient.
Riproaring answered 27/11, 2014 at 15:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.