How do I iterate over the words of a string?

Asked 25/10, 2008 at 8:58 Answered 25/10, 2008 at 8:58

3357

How do I iterate over the words of a string composed of words separated by whitespace?

Note that I'm not interested in C string functions or that kind of character manipulation/access. I prefer elegance over efficiency. My current solution:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

Apocopate answered 25/10, 2008 at 8:58 Comment(8)

Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;) – Creamy 25/10, 2008 at 9:4

while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; } – Eddi 29/9, 2009 at 15:47

@nlaq, Except that you'd have to convert your string object using c_str(), and back to a string again if you still needed it to be a string, no? – Extortionary 15/2, 2011 at 0:0

@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e. string sub; while (iss >> sub) cout << "Substring: " << sub << '\n'; – Stockwell 11/4, 2012 at 2:24

Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/split – Apiculture 31/10, 2013 at 0:23

There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability. – Singleminded 31/3, 2017 at 13:22

Most of the answers here are notably latin-centric. Many of the answers assume a single character can be used as 'whitespace' even though the question defines the delimiter to be whitespace. Unicode has at least 25 whitespace characters. But word-delimiting is not merely a whitespace issue. For instance, in syllabic writing, such as Tibetan, word delimitation is a semantic, rather than syntactic, problem. Therefore, using whitespace to extract words is not a suitable approach for many languages. – Danaedanaher 29/10, 2018 at 12:8

Small addition to the above. You can add a locale facet that treats punctuation as space so you don't need to handle that separately. codereview.stackexchange.com/a/57467/507 – Harmonic 20/2, 2019 at 21:26

1510

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};

Siphonostele answered 25/10, 2008 at 8:58 Comment(28)

Is it possible to specify a delimiter for this? Like for instance splitting on commas? – Interviewee 6/8, 2009 at 11:49

@l3dx: it seems that the parameter "\n" is the delimiter. This code is very nice, but I would like to know better about it. Maybe somebody could explain each line of that snippet? – Very 11/12, 2009 at 17:30

@Jonathan: \n is not the delimiter in this case, it's the deliminer for outputting to cout. – Pals 3/2, 2010 at 12:37

based on this: cplusplus.com/reference/algorithm/copy no. The whitespace behavior is a function of the istream_iterator. It would be more elegant to roll your own. – Curler 4/8, 2010 at 17:59

@graham.reeds, @l3dx: Please don't write another CSV parser which can't handle quoted fields: en.wikipedia.org/wiki/Comma-separated_values – Outplay 1/9, 2010 at 9:30

This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable. – Baresark 10/1, 2011 at 3:57

To people asking how this works: equivalent code using less of the STL would look like string token; istringstream iss(sentence); while (iss >> token) { cout << token; } or { tokens.push_back(token); } – Licence 7/2, 2011 at 5:11

Why do I get "error C2664: 'std::back_inserter' : cannot convert parameter 1 from 'std::vector<_Ty> (__cdecl *)(void)' to 'std::vector<_Ty> &'" in VS2008? – Pustulate 17/4, 2011 at 10:22

The template argument to back_inserter should be string, not vector<string>. That is, it should be back_inserter<string>(tokens), not back_inserter<vector<string>>(tokens). – Bailiff 27/5, 2012 at 14:56

Take a look at ranges if you care about elegance in practical terms (i.e. do more with less code): slideshare.net/rawwell/iteratorsmustgo – Sweatbox 17/10, 2012 at 18:27

Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings. – Self 19/12, 2012 at 20:30

The main purpose of istream_iterator is it can parse int, float, double, etc from an istream: istream_iterator<double> does a decent job reading doubles separated by space. With a front or especially back inserter it's a great combo! :) – Koffler 11/1, 2013 at 2:48

vector has a ctor that takes a begin and end iterator, so no need for the copy call to insert them into a container. – Miner 13/1, 2013 at 18:41

@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one. – Crossness 7/2, 2013 at 15:8

@Nawaz Why should it? You're inserting into a std::vector<std::string> and not into a std::string. But then again, there shouldn't be an explicit template argument, anyway (well, there shouldn't even be a back_inserter or copy, but ok). – Crossness 7/2, 2013 at 15:12

@ChristianRau: Oh you're right; the first code-snippet probably confused me. Actually I should have said you don't need to mention the template argument in std::back_inserter; in fact, mentioning template argument defies the very purpose of back_inserter. – Bailiff 7/2, 2013 at 16:30

why do you need to use curly brackets in vector<string> tokens{istream_iterator<string>{iss}, istream_iterator<string>{}}; is it because otherwise it looks like function call? – Denary 7/1, 2014 at 5:6

Questions: 1. why would istream_iterator stop at white spaces? For me spaces are also part of the string; 2. why is it very inefficient? – Stallion 22/4, 2015 at 12:23

The elegance in needing 5 includes, 3 lines (not counting using <namespace> and quite cryptic code to... split a string? dear god. – Hhd 22/4, 2015 at 15:42

We could also have used STL to split a string. – Hernandes 30/8, 2015 at 11:31

This is much faster than Evan Teran's answer if you only need to split on whitespace. – Auden 7/7, 2016 at 15:23

While the missing delimiter concern is correct one should take into account that the OPs solution couldn't handle that either. So this seems to be not a requirement. – Erubescent 21/7, 2016 at 20:40

@doorfly The only place where curly brackets are needed is istream_iterator<string>{}, because that would otherwise be regarded as a function. – Negrito 28/2, 2017 at 20:31

If using wstring and your code breaks, check this answer for fixing the istream_iterator usage with wchar_t: https://mcmap.net/q/40727/-c-using-istream_iterator-with-wstringstream – Nereus 3/7, 2018 at 20:44

@Interviewee Yes. You can add a specialized local to the stream that makes a , a space (and all other characters not a space). Then the code will work just the same. codereview.stackexchange.com/a/57467/507 – Harmonic 20/2, 2019 at 21:30

This code could really use some comments to explain what the purpose of every item is. A typical person asking this question is only going to end up with more questions after reading this, e.g. what the purpose of the empty istream_iterator is, or why the "create the vector directly" solution has so many brackets. – Idona 14/10, 2019 at 21:17

I don't think there is any power or elegance in this, compared to just std::string::split(). Of course there is not such split in STL – Smog 14/5, 2020 at 12:0

You can set the delimiter of istringstream https://mcmap.net/q/40728/-how-to-extract-mixed-format-using-istringstream – Hoax 2/7, 2020 at 17:44

2584

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');

Smolensk answered 25/10, 2008 at 8:58 Comment(27)

elegant solution, I always forget about this particular "getline", thou I do not believe it is aware of quotes and escape sequences. – Ascarid 27/5, 2010 at 13:32

@stijn: are you saying that split("one two three", ' '); returns a vector with 4 elements? I'm not sure that is the case, but I'll test it. – Smolensk 9/11, 2010 at 15:45

wait, it seems the formatting removed some spaces (or I forgot them): I'm talking about the string "one two three" with 2 spaces between "two" and "three" – Hulky 9/11, 2010 at 18:54

I liked this solution, however, I wrapped the function in a template, changing the vectors std::string template parameter into a parameter. For me, I also used boost::lexical_cast on said template parameter in the push_back. – Renita 9/8, 2012 at 19:30

How can I modify it to work with std::wstring, std::getline won't work right? – Methadone 19/11, 2012 at 9:9

std::getline is templated, so it may "just work", if not see en.cppreference.com/w/cpp/string/basic_string/getline to figure out how to tweak it. Passing a wchar_t character as the delim may be enough to trigger the right template. – Smolensk 19/11, 2012 at 16:29

if you are enabling return value optimization, can't you make the function to return void? – Flores 10/7, 2013 at 14:52

In order to avoid it skipping empty tokens, do an empty() check: if (!item.empty()) elems.push_back(item) – Valedictory 9/11, 2013 at 22:33

How about the delim contains two chars as ->? – Moller 26/12, 2013 at 8:15

@herohuyongtao, this solution only works for single char delimiters. – Smolensk 27/12, 2013 at 6:11

@Copperpot How did you do it in a template? – Bolme 12/1, 2014 at 23:2

@EvanTeran This may be not regarding splitting the string but general doubt in your code, The elems you are passing as an reference argument and returning the reference again. I just wanted to know is there any reason for that? – Alemannic 25/1, 2014 at 17:27

@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this: f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like. – Smolensk 25/1, 2014 at 17:50

Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value. – Montherlant 9/9, 2015 at 19:4

almost perfect: split(":abc:def:", ':'); returns only 3 instead of 4 elements! – Molliemollify 9/9, 2015 at 20:31

Being able to set max number of returned elements is crucial to me. – Bulgar 29/10, 2015 at 1:25

@Jonny, should be trivial, just add an extra condition to the while loop comparing the vector's size to the max. Something like this: while (elems.size() < max_count && std::getline(ss, item, delim)) { – Smolensk 29/10, 2015 at 5:57

@Jonny, I see. Your answer looks a bit more complex than necessary. If you make the max default to something like size_t(-1), that will effectively be "infinity" (it's the biggest size your system can represent, so you'll run out of RAM before you hit this). Then you can make the condition as simple as my comment above. No more need to double check the stream state and do a second read and such. Just a suggestion :-). – Smolensk 29/10, 2015 at 6:2

Might be wrong but you might lose the end of the string with that. Well basically I mimic the explode function of php, or so I believe. – Bulgar 29/10, 2015 at 6:8

Gotcha. My solution will stop at max_count, skipping the rest of the string (since it found the amount it wanted). I guess you are looking for something that will always make the last one the rest of the string. I have some functions like that too here: github.com/eteran/cpp-utilities/blob/master/string.h Some are specifically designed to match php's string manipulation functions as closely as possible :-) – Smolensk 29/10, 2015 at 6:21

Why not return split(s, delim, std::vector<std::string>()); ? – Bethanie 29/10, 2015 at 19:53

@Gabriel, you could. But I think when it was written (a few years ago), having a named variable encouraged NVRO more reliably. With C++11 move semantics, it may be a lot less of a difference. – Smolensk 30/10, 2015 at 3:16

be aware that if you are using OpenCV, split can be confused with split from OpenCV that splits images. – Tyrannous 20/6, 2017 at 16:7

I really wish they'd add a standard method with this signature: vector<string> std::string::split(char delimiter = ' '); – Hanshaw 2/2, 2018 at 22:26

@Bolme See gitlab.com/tbeu/wcx_setfolderdate/blob/master/src/splitstring.h for a templated implementation. – Hyperthermia 7/7, 2019 at 20:56

@Hyperthermia fixing your link: gitlab.com/tbeu/wcx_setfolderdate/-/blob/master/src/… – Jeaninejeanlouis 20/3, 2020 at 4:17

As others noted this does not correctly handle emtpy strings at the end. (This is not a matter of definition since "a,b," and "a,b" both give the same result.) This can be fixed by initializing iss with s + delim and handling the special case that an empty strig should return an empty list explicitly. – Decidua 11/11, 2021 at 23:58

1510

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};