Right way to split an std::string into a vector<string>

Asked 9/4, 2011 at 20:13 Answered 14/7, 2022 at 23:15

107

What is the right way to split a string into a vector of strings? Delimiter is space or comma.

Extensor answered 9/4, 2011 at 20:13 Comment(2)

A split in which commas and spaces are both delimiters, or a function that splits either on space or on comma, according to a parameter? – Cocytus 9/4, 2011 at 20:18

Some of the answers to https://mcmap.net/q/40394/-how-do-i-iterate-over-the-words-of-a-string can readily be adapted to work with multiple delimiters. – Bugbane 9/4, 2011 at 20:23

113

For space separated strings, then you can do this:

std::string s = "What is the right way to split a string into a vector of strings";
std::stringstream ss(s);
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

What
is
the
right
way
to
split
a
string
into
a
vector
of
strings

string that have both comma and space

struct tokens: std::ctype<char> 
{
    tokens(): std::ctype<char>(get_table()) {}
 
    static std::ctype_base::mask const* get_table()
    {
        typedef std::ctype<char> cctype;
        static const cctype::mask *const_rc= cctype::classic_table();
 
        static cctype::mask rc[cctype::table_size];
        std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));
 
        rc[','] = std::ctype_base::space; 
        rc[' '] = std::ctype_base::space; 
        return &rc[0];
    }
};
 
std::string s = "right way, wrong way, correct way";
std::stringstream ss(s);
ss.imbue(std::locale(std::locale(), new tokens()));
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

right
way
wrong
way
correct
way

Changeless answered 9/4, 2011 at 20:22 Comment(12)

std::vector<std::string> vstrings(begin, end); would be nicer IMO, but I suppose we don't know whether the questioner is constructing the vector, or hoping to populate a pre-existing vector. – Cocytus 9/4, 2011 at 20:28

Nice, but wrong. The OP was specific in that both space and comma are delimeters. And you can't do the same trick in this case, can you? – Mccain 9/4, 2011 at 20:32

@Steve: Nice suggestion. @Armen: OP didn't mention anything when I gave the solution. The question doesn't seem to be clear enough. Otherwise there're some elegant ways to deal with both space and comma simultenously: #4889379 – Changeless 9/4, 2011 at 20:34

I like the use of istream_iterator but why not finish strong using ostream_iterator as well? – Injunction 9/4, 2011 at 20:37

@Oli, @Steve and @Armen: Please see my second solution. And let me know if it can still be improved. :-) – Changeless 9/4, 2011 at 20:47

@Nawaz: the obvious possible improvement would be to replace the final for loop with a call to std::copy. – Cystocele 9/4, 2011 at 20:58

@Jerry: haha.. nice one. forgot that though other times I make use of it as well. thanks for reminding it. :-) – Changeless 9/4, 2011 at 21:0

This is an amazing answer and needs to be highlighted somehow. – Slipstream 1/4, 2013 at 11:7

@Samveen: Thanks for the appreciation. :-) – Changeless 1/4, 2013 at 12:42

+1 Very nice. But don't you have to delete the tokens struct created in ss.imbue(std::locale(std::locale(), new tokens())) somewhere? – Dancer 28/12, 2015 at 14:24

@Yes, I thought the read will do that. Useauto loc = std::make_shared<tokens>(), and then pass ss.imbue(..., loc.get()));. That should work. – Changeless 28/12, 2015 at 15:20

@StringerBell The token-facet should be cleaned up by locale's destructor. – Hilda 30/8, 2016 at 23:1

144

A convenient way would be boost's string algorithms library.

#include <boost/algorithm/string/classification.hpp> // Include boost::for is_any_of
#include <boost/algorithm/string/split.hpp> // Include for boost::split
// ...

std::vector<std::string> words;
std::string s;
boost::split(words, s, boost::is_any_of(", "), boost::token_compress_on);

Koval answered 9/4, 2011 at 20:24 Comment(2)

What is token_compress_on for? – Panay 31/3, 2021 at 22:42

@Panay From the documentation: If eCompress (the fourth argument) is set to token_compress_on, adjacent separators are merged together. Otherwise, every two separators delimit a token. boost.org/doc/libs/1_49_0/doc/html/boost/algorithm/… – Countersign 3/4, 2021 at 19:36

113

For space separated strings, then you can do this:

std::string s = "What is the right way to split a string into a vector of strings";
std::stringstream ss(s);
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

What
is
the
right
way
to
split
a
string
into
a
vector
of
strings

string that have both comma and space

struct tokens: std::ctype<char> 
{
    tokens(): std::ctype<char>(get_table()) {}
 
    static std::ctype_base::mask const* get_table()
    {
        typedef std::ctype<char> cctype;
        static const cctype::mask *const_rc= cctype::classic_table();
 
        static cctype::mask rc[cctype::table_size];
        std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));
 
        rc[','] = std::ctype_base::space; 
        rc[' '] = std::ctype_base::space; 
        return &rc[0];
    }
};
 
std::string s = "right way, wrong way, correct way";
std::stringstream ss(s);
ss.imbue(std::locale(std::locale(), new tokens()));
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

right
way
wrong
way
correct
way