Prefer Iterators Over Pointers?

Asked 17/3, 2015 at 12:30 Answered 17/3, 2015 at 14:6

Solved c++string iterator char-pointer const-pointer

This question is a bump of a question that had a comment here but was deleted as part of the bump.

For those of you who can't see deleted posts, the comment was on my use of const char*s instead of string::const_iterators in this answer: "Iterators may have been a better path from the get go, since it appears that is exactly how your pointers seems be treated."

So my question is this, do iterators hold string::const_iterators hold any intrinsic value over a const char*s such that switching my answer over to string::const_iterators makes sense?

Psychopathist answered 17/3, 2015 at 12:30 Comment(1)

Easier debugging (on supported compilers) is #1 reason for me. – Hillaryhillbilly 17/3, 2015 at 13:55

Introduction

There are many perks of using iterators instead of pointers, among them are:

different code-path in release vs debug, and;
better type-safety, and;
making it possible to write generic code (iterators can be made to work with any data-structure, such as a linked-list, whereas intrinsic pointers are very limited in this regard).

Debugging

Since, among other things, dereferencing an iterator that is passed the end of a range is undefined-behavior, an implementation is free to do whatever it feels necessary in such case - including raising diagnostics saying that you are doing something wrong.

The standard library implementation, libstdc++, provided by gcc will issues diagnostics when it detects something fault (if Debug Mode is enabled).

_Example

#define _GLIBCXX_DEBUG 1 /* enable debug mode */

#include <vector>
#include <iostream>

int
main (int argc, char *argv[])
{
  std::vector<int> v1 {1,2,3};

  for (auto it = v1.begin (); ; ++it)
    std::cout << *it;
}

/usr/include/c++/4.9.2/debug/safe_iterator.h:261:error: attempt to 
    dereference a past-the-end iterator.

Objects involved in the operation:
iterator "this" @ 0x0x7fff828696e0 {
type = N11__gnu_debug14_Safe_iteratorIN9__gnu_cxx17__normal_iteratorIPiNSt9__cxx19986vectorIiSaIiEEEEENSt7__debug6vectorIiS6_EEEE (mutable iterator);
  state = past-the-end;
  references sequence with type `NSt7__debug6vectorIiSaIiEEE' @ 0x0x7fff82869710
}
123

The above would not happen if we were working with pointers, no matter if we are in debug-mode or not.

If we don't enable debug mode for libstdc++, a more performance friendly version (without the added bookkeeping) implementation will be used - and no diagnostics will be issued.

(Potentially) better Type Safety

Since the actual type of iterators are implementation-defined, this could be used to increase type-safety - but you will have to check the documentation of your implementation to see whether this is the case.

Consider the below example:

#include <vector>

struct A     { };
struct B : A { };

                                                      // .-- oops
                                                      // v
void  it_func (std::vector<B>::iterator beg, std::vector<A>::iterator end);

void ptr_func (B * beg, A * end);
                     // ^-- oops

int
main (int argc, char *argv[])
{
  std::vector<B> v1;

   it_func (v1.begin (), v1.end  ());               // (A)
  ptr_func (v1.data  (), v1.data () + v1.size ());  // (B)
}

_Elaboration

(A) could, depending on the implementation, be a compile-time error since std::vector<A>::iterator and std::vector<B>::iterator potentially isn't of the same type.
(B) would, however, always compile since there's an implicit conversion from B* to A*.

Membership answered 17/3, 2015 at 12:57 Comment(2)

In your Type Safety example, you said that in case (B) there's an implicit conversion from B* to A*, and therefore no compilation error. Is this something that could lead to run-time errors? Because otherwise I'd say there are no drawbacks, and it is actually better. Could you please expand on that a little? – Campground 17/3, 2015 at 13:40

@FabioTurati It is a very forced example, I didn't spend much time making it into a real world scenario; but let's say you are working with a non-virtual member-function void print() and you expect it_func to call B::print, but it instead ends up calling A::print because of the implicit conversion.. you meant to write std::vector<B>::iterator, but you didn't. I'll update the snippet with a slightly more realistic scenario. – Alongside 17/3, 2015 at 13:49

Iterators are intended to provide an abstraction over pointers.

For example, incrementing an iterator always manipulates the iterator so that if there's a next item in the collection, it refers to that next item. If it already referred to the last item in the collection, after the increment it'll be a unique value that can't be dereferenced, but will compare equal to another iterator pointing one past the end of the same collection (usually obtained with collection.end()).

In the specific case of an iterator into a string (or a vector), a pointer provides all the capabilities required of an iterator, so a pointer can be used as an iterator with no loss of required functionality.

For example, you could use std::sort to sort the items in a string or a vector. Since pointers provide the required capabilities, you can also use it to sort items in a native (C-style) array.

At the same time, yes, defining (or using) an iterator that's separate from a pointer can provide extra capabilities that aren't strictly required. Just for example, some iterators provide at least some degree of checking, to assure that (for example) when you compare two iterators, they're both iterators into the same collection, and that you aren't attempting an out of bounds access. A raw pointer can't (or at least normally won't) provide this kind of capability.

Much of this comes back to the "don't pay for what you don't use" mentality. If you really only need and want the capabilities of native pointers, they can be used as iterators, and you'll normally get code that's essentially identical to what you'd get by directly manipulating pointers. At the same time, for cases where you do want extra capabilities, such as traversing a threaded RB-tree or a B+ tree instead of a simple array, iterators allow you to do that while maintaining a single, simple interface. Likewise, for cases where you don't mind paying extra (in terms of storage and/or run-time) for extra safety, you can get that too (and it's decoupled from things like the individual algorithm, so you can get it where you want it without being forced to use it in other places that may, for example, have too critical of timing requirements to support it.

In my opinion, many people kind of miss the point when it comes to iterators. Many people happily rewrite something like:

for (size_t i=0; i<s.size(); i++)

...into something like:

for (std::string::iterator i = s.begin; i != s.end(); i++)

...and act as if it's a major accomplishment. I don't think it is. For a case like this, there's probably little (if any) gain from replacing an integer type with an iterator. Likewise, taking the code you posted and changing char const * to std::string::iterator seems unlikely to accomplish much (if anything). In fact, such conversions often make the code more verbose and less understandable, while gaining nothing in return.

If you were going to change the code, you should (in my opinion) do so in an attempt at making it more versatile by making it truly generic (which std::string::iterator really isn't going to do).

For example, consider your split (copied from the post you linked):

vector<string> split(const char* start, const char* finish){
    const char delimiters[] = ",(";
    const char* it;
    vector<string> result;

    do{
        for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
            it != finish && *it == '(';
            it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
        auto&& temp = interpolate(start, it);
        result.insert(result.end(), temp.begin(), temp.end());
        start = ++it;
    } while (it <= finish);
    return result;
}

As it stands, this is restricted to being used on narrow strings. If somebody wants to work with wide strings, UTF-32 strings, etc., it's relatively difficult to get it to do that. Likewise, if somebody wanted to match [ or '{' instead of (, the code would need to be rewritten for that as well.

If there were a chance of wanting to support various string types, we might want to make the code more generic, something like this:

template <class InIt, class OutIt, class charT>
void split(InIt start, InIt finish, charT paren, charT comma, OutIt result) {
    typedef std::iterator_traits<OutIt>::value_type o_t;
    charT delimiters[] = { comma, paren };
    InIt it;

    do{
        for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
            it != finish && *it == paren;
            it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
        auto&& temp = interpolate(start, it);
        *result++ = o_t{temp.begin(), temp.end()};
        start = ++it;
    } while (it != finish);
}

This hasn't been tested (or even compiled) so it's really just a sketch of a general direction you could take the code, not actual, finished code. Nonetheless, I think the general idea should at least be apparent--we don't just change it to "use iterators". We change it to be generic, and iterators (passed as template parameters, with types not directly specified here) are only a part of that. To get very far, we also eliminated hard-coding the paren and comma characters. Although not strictly necessary, I also change the parameters to fit more closely with the convention used by standard algorithms, so (for example) output is also written via an iterator rather than being returned as a collection.

Although it may not be immediately apparent, the latter does add quite a bit of flexibility. Just for example, if somebody just wanted to print out the strings after splitting them, he could pass an std::ostream_iterator, to have each result written directly to std::cout as it's produced, rather than getting a vector of strings, and then having to separately print them out.

Heartstrings answered 17/3, 2015 at 14:6 Comment(1)

Very interesting answer. I have to admit I haven't fully understood your example at the end, but I did get your general point that iterators are not necessarily much better than regular pointers, at least not always, and they also have an extra cost. And it's true that code gets more verbose and less readable. You've given me a new point of view about this. +1, fully deserved! – Campground 18/3, 2015 at 1:9

Introduction

Debugging

(Potentially) better Type Safety

Recommended topics

Hot tags