TL;DR: s.end() + 1
is undefined behavior.
std::string
is a strange beast, mainly for historical reasons:
- It attempts to bring C compatibility, where it is known that an additional
\0
character exists beyond the length reported by strlen
.
- It was designed with an index-based interface.
- As an after thought, when merged in the Standard library with the rest of the STL code, an iterator-based interface was added.
This led std::string
, in C++03, to number 103 member functions, and since then a few were added.
Therefore, discrepancies between the different methods should be expected.
Already in the index-based interface discrepancies appear:
§21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1/ Requires: pos <= size()
const_reference at(size_type pos) const;
reference at(size_type pos);
5/ Throws: out_of_range
if pos >= size()
Yes, you read this right, s[s.size()]
returns a reference to a NUL character while s.at(s.size())
throws an out_of_range
exception. If anyone tells you to replace all uses of operator[]
by at
because they are safer, beware the string
trap...
So, what about iterators?
§21.4.3 [string.iterators]
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cend() const noexcept;
2/ Returns: An iterator which is the past-the-end value.
Wonderfully bland.
So we have to refer to other paragraphs. A pointer is offered by
§21.4 [basic.string]
3/ The iterators supported by basic_string
are random access iterators (24.2.7).
while §17.6 [requirements] seems devoid of anything related. Thus, strings iterators are just plain old iterators (you can probably sense where this is going... but since we came this far let's go all the way).
This leads us to:
24.2.1 [iterator.requirements.general]
5/ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i
for which the expression *i
is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. [...]
So, *s.end()
is ill-formed.
24.2.3 [input.iterators]
2/ Table 107 -- Input iterator requirements (in addition to Iterator)
List for pre-condition to ++r
and r++
that r
be dereferencable.
Neither the Forward iterators, Bidirectional iterators nor Random iterator lift this restriction (and all indicate they inherit the restrictions of their predecessor).
Also, for completeness, in 24.2.7 [random.access.iterators], Table 111 -- Random access iterator requirements (in addition to bidirectional iterator) lists the following operational semantics:
r += n
is equivalent to [inc|dec]rememting r
n
times
a + n
and n + a
are equivalent to copying a
and then applying += n
to the copy
and similarly for -= n
and - n
.
Thus s.end() + 1
is undefined behavior.
end
does point to the\0
character.end + 1
would be an attempt to go beyond the terminating\0
, wouldn't it? – Methodologyend
iterator may point to not existing memory,std::string::end
point to existing, so as I understand the question is alwaysstd::string::end + 1
return valid iterator that may be use for rangefrom begin to zero byte
– Sheldonc_str()
. – Snipstd::string
effectively allows*end(str)
. In practice,assert(*end(str) == '\0')
should never fail, but is it correct C++11? – Methodologyend + 1
point to'\0'
? – Methodologyoperator++
ofstd::basic_string::iterator
toassert
on the current position being less thans.end()
. But I'm pretty sure it is well defined to says.c_str() + s.length() + 1
(or&(*s.begin())
) because those are guaranteed to be raw pointers. – Erigeronend + 1
not point to zero,end
point to zero, so to iterate thourhg all chars you can usebegin
andend
and to iterate for all chars + zero byte you can or may be can not usebegin
end+1
– Sheldonstring_ref
isn't astring
. And for good reasons it doesn't have the additional guarantees the latter provides. – Boomkinend
points to zero, thenend + 1
must be undefined behaviour, because what else do expect in the dark hole beyond\0
? – Methodologystd::vector
orstd::list
,end
does not point to an element but to "one past the last element". It just happens to be thatstd::string
leaks a bit of its abstraction such that you can peek at "one past the last element" and conclude that it is a\0
character. – Methodologyend
point to the dark whole asend+1
ofstd::string
that is the point of question. yesend+1
deference is bad thing, but what about usage it as normalend
in algorithms – Sheldon\0
part of the elements or not?c_str()
(ordata()
) seem to suggest it, but all member functions that deal with the abstraction thatstd::string
represents, likesize()
orend()
, clearly don't. – Methodologyc_str()
is declared asconst
, calling it doesn't make a difference. – Adnate\0
and which don't. – Trapeziuslength
andsize
are the same: en.cppreference.com/w/cpp/string/basic_string/size – Wenzelmutable
keyword just to confuse people who thinkconst
meansconst
. – Snipstd::string
isn't just a container, it's a string, with string-specific additions (members, and use of traits for customization). Yes, if there's no advantage to writing code customized to use a string, it's a good idea to stay more generic. Doesn't mean if there is some reason it must be a string, taking advantage of that is stupid. – Boomkinend()
the code will be super surprised when it gets'e'
instead of'\0'
. – Snipstd::initializer_list
, going overstd::vector
and not ending withstd::string_view
. As I said, one should always code to the most common interface one can, but if one knows the interface is bigger, and using more has any advantage, there's nothing wrong with that. Sometimes one even specializes templates to take advantage of such opportunities, and that's as it should be. – Boomkin'\0'
. It could even be on a different memory guarded page that traps access. – Buckthorn.data()
or.c_str()
, thestd::basic_string
is free to leave the trailing null uninitialized or trap and cause your program to format your hard drive it it is written (undefined behavior). It cannot reallocate to make room for it, as.data()
and.c_str()
are constant-time, but setting the value of the null terminator and/or changing the trap information on a page is a constant-time operation. In theory, the trap could extend to even forming the pointer to the null terminator (on some architecture with pointer trap values) AFAICT. – Buckthorn