How do I iterate over the words of a string?
Asked Answered
A

83

3357

How do I iterate over the words of a string composed of words separated by whitespace?

Note that I'm not interested in C string functions or that kind of character manipulation/access. I prefer elegance over efficiency. My current solution:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main() {
    string s = "Somewhere down the road";
    istringstream iss(s);

    do {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}
Apocopate answered 25/10, 2008 at 8:58 Comment(8)
Dude... Elegance is just a fancy way to say "efficiency-that-looks-pretty" in my book. Don't shy away from using C functions and quick methods to accomplish anything just because it is not contained within a template ;)Creamy
while (iss) { string subs; iss >> subs; cout << "Substring: " << sub << endl; }Eddi
@nlaq, Except that you'd have to convert your string object using c_str(), and back to a string again if you still needed it to be a string, no?Extortionary
@Eduardo: that's wrong too... you need to test iss between trying to stream another value and using that value, i.e. string sub; while (iss >> sub) cout << "Substring: " << sub << '\n';Stockwell
Various options in C++ to do this by default: cplusplus.com/faq/sequences/strings/splitApiculture
There's more to elegance than just pretty efficiency. Elegant attributes include low line count and high legibility. IMHO Elegance is not a proxy for efficiency but maintainability.Singleminded
Most of the answers here are notably latin-centric. Many of the answers assume a single character can be used as 'whitespace' even though the question defines the delimiter to be whitespace. Unicode has at least 25 whitespace characters. But word-delimiting is not merely a whitespace issue. For instance, in syllabic writing, such as Tibetan, word delimitation is a semantic, rather than syntactic, problem. Therefore, using whitespace to extract words is not a suitable approach for many languages.Danaedanaher
Small addition to the above. You can add a locale facet that treats punctuation as space so you don't need to handle that separately. codereview.stackexchange.com/a/57467/507Harmonic
S
1510

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};
Siphonostele answered 25/10, 2008 at 8:58 Comment(28)
Is it possible to specify a delimiter for this? Like for instance splitting on commas?Interviewee
@l3dx: it seems that the parameter "\n" is the delimiter. This code is very nice, but I would like to know better about it. Maybe somebody could explain each line of that snippet?Very
@Jonathan: \n is not the delimiter in this case, it's the deliminer for outputting to cout.Pals
based on this: cplusplus.com/reference/algorithm/copy no. The whitespace behavior is a function of the istream_iterator. It would be more elegant to roll your own.Curler
@graham.reeds, @l3dx: Please don't write another CSV parser which can't handle quoted fields: en.wikipedia.org/wiki/Comma-separated_valuesOutplay
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.Baresark
To people asking how this works: equivalent code using less of the STL would look like string token; istringstream iss(sentence); while (iss >> token) { cout << token; } or { tokens.push_back(token); }Licence
Why do I get "error C2664: 'std::back_inserter' : cannot convert parameter 1 from 'std::vector<_Ty> (__cdecl *)(void)' to 'std::vector<_Ty> &'" in VS2008?Pustulate
The template argument to back_inserter should be string, not vector<string>. That is, it should be back_inserter<string>(tokens), not back_inserter<vector<string>>(tokens).Bailiff
Take a look at ranges if you care about elegance in practical terms (i.e. do more with less code): slideshare.net/rawwell/iteratorsmustgoSweatbox
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.Self
The main purpose of istream_iterator is it can parse int, float, double, etc from an istream: istream_iterator<double> does a decent job reading doubles separated by space. With a front or especially back inserter it's a great combo! :)Koffler
vector has a ctor that takes a begin and end iterator, so no need for the copy call to insert them into a container.Miner
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.Crossness
@Nawaz Why should it? You're inserting into a std::vector<std::string> and not into a std::string. But then again, there shouldn't be an explicit template argument, anyway (well, there shouldn't even be a back_inserter or copy, but ok).Crossness
@ChristianRau: Oh you're right; the first code-snippet probably confused me. Actually I should have said you don't need to mention the template argument in std::back_inserter; in fact, mentioning template argument defies the very purpose of back_inserter.Bailiff
why do you need to use curly brackets in vector<string> tokens{istream_iterator<string>{iss}, istream_iterator<string>{}}; is it because otherwise it looks like function call?Denary
Questions: 1. why would istream_iterator stop at white spaces? For me spaces are also part of the string; 2. why is it very inefficient?Stallion
The elegance in needing 5 includes, 3 lines (not counting using <namespace> and quite cryptic code to... split a string? dear god.Hhd
We could also have used STL to split a string.Hernandes
This is much faster than Evan Teran's answer if you only need to split on whitespace.Auden
While the missing delimiter concern is correct one should take into account that the OPs solution couldn't handle that either. So this seems to be not a requirement.Erubescent
@doorfly The only place where curly brackets are needed is istream_iterator<string>{}, because that would otherwise be regarded as a function.Negrito
If using wstring and your code breaks, check this answer for fixing the istream_iterator usage with wchar_t: https://mcmap.net/q/40727/-c-using-istream_iterator-with-wstringstreamNereus
@Interviewee Yes. You can add a specialized local to the stream that makes a , a space (and all other characters not a space). Then the code will work just the same. codereview.stackexchange.com/a/57467/507Harmonic
This code could really use some comments to explain what the purpose of every item is. A typical person asking this question is only going to end up with more questions after reading this, e.g. what the purpose of the empty istream_iterator is, or why the "create the vector directly" solution has so many brackets.Idona
I don't think there is any power or elegance in this, compared to just std::string::split(). Of course there is not such split in STLSmog
You can set the delimiter of istringstream https://mcmap.net/q/40728/-how-to-extract-mixed-format-using-istringstreamHoax
S
2584

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');
Smolensk answered 25/10, 2008 at 8:58 Comment(27)
elegant solution, I always forget about this particular "getline", thou I do not believe it is aware of quotes and escape sequences.Ascarid
@stijn: are you saying that split("one two three", ' '); returns a vector with 4 elements? I'm not sure that is the case, but I'll test it.Smolensk
wait, it seems the formatting removed some spaces (or I forgot them): I'm talking about the string "one two three" with 2 spaces between "two" and "three"Hulky
I liked this solution, however, I wrapped the function in a template, changing the vectors std::string template parameter into a parameter. For me, I also used boost::lexical_cast on said template parameter in the push_back.Renita
How can I modify it to work with std::wstring, std::getline won't work right?Methadone
std::getline is templated, so it may "just work", if not see en.cppreference.com/w/cpp/string/basic_string/getline to figure out how to tweak it. Passing a wchar_t character as the delim may be enough to trigger the right template.Smolensk
if you are enabling return value optimization, can't you make the function to return void?Flores
In order to avoid it skipping empty tokens, do an empty() check: if (!item.empty()) elems.push_back(item)Valedictory
How about the delim contains two chars as ->?Moller
@herohuyongtao, this solution only works for single char delimiters.Smolensk
@Copperpot How did you do it in a template?Bolme
@EvanTeran This may be not regarding splitting the string but general doubt in your code, The elems you are passing as an reference argument and returning the reference again. I just wanted to know is there any reason for that?Alemannic
@JeshwanthKumarNK, it's not necessary, but it lets you do things like pass the result directly to a function like this: f(split(s, d, v)) while still having the benefit of a pre-allocated vector if you like.Smolensk
Caveat: split("one:two::three", ':') and split("one:two::three:", ':') return the same value.Montherlant
almost perfect: split(":abc:def:", ':'); returns only 3 instead of 4 elements!Molliemollify
Being able to set max number of returned elements is crucial to me.Bulgar
@Jonny, should be trivial, just add an extra condition to the while loop comparing the vector's size to the max. Something like this: while (elems.size() < max_count && std::getline(ss, item, delim)) {Smolensk
@Jonny, I see. Your answer looks a bit more complex than necessary. If you make the max default to something like size_t(-1), that will effectively be "infinity" (it's the biggest size your system can represent, so you'll run out of RAM before you hit this). Then you can make the condition as simple as my comment above. No more need to double check the stream state and do a second read and such. Just a suggestion :-).Smolensk
Might be wrong but you might lose the end of the string with that. Well basically I mimic the explode function of php, or so I believe.Bulgar
Gotcha. My solution will stop at max_count, skipping the rest of the string (since it found the amount it wanted). I guess you are looking for something that will always make the last one the rest of the string. I have some functions like that too here: github.com/eteran/cpp-utilities/blob/master/string.h Some are specifically designed to match php's string manipulation functions as closely as possible :-)Smolensk
Why not return split(s, delim, std::vector<std::string>()); ?Bethanie
@Gabriel, you could. But I think when it was written (a few years ago), having a named variable encouraged NVRO more reliably. With C++11 move semantics, it may be a lot less of a difference.Smolensk
be aware that if you are using OpenCV, split can be confused with split from OpenCV that splits images.Tyrannous
I really wish they'd add a standard method with this signature: vector<string> std::string::split(char delimiter = ' ');Hanshaw
@Bolme See gitlab.com/tbeu/wcx_setfolderdate/blob/master/src/splitstring.h for a templated implementation.Hyperthermia
@Hyperthermia fixing your link: gitlab.com/tbeu/wcx_setfolderdate/-/blob/master/src/…Jeaninejeanlouis
As others noted this does not correctly handle emtpy strings at the end. (This is not a matter of definition since "a,b," and "a,b" both give the same result.) This can be fixed by initializing iss with s + delim and handling the special case that an empty strig should return an empty list explicitly.Decidua
S
1510

For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

... or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};
Siphonostele answered 25/10, 2008 at 8:58 Comment(28)
Is it possible to specify a delimiter for this? Like for instance splitting on commas?Interviewee
@l3dx: it seems that the parameter "\n" is the delimiter. This code is very nice, but I would like to know better about it. Maybe somebody could explain each line of that snippet?Very
@Jonathan: \n is not the delimiter in this case, it's the deliminer for outputting to cout.Pals
based on this: cplusplus.com/reference/algorithm/copy no. The whitespace behavior is a function of the istream_iterator. It would be more elegant to roll your own.Curler
@graham.reeds, @l3dx: Please don't write another CSV parser which can't handle quoted fields: en.wikipedia.org/wiki/Comma-separated_valuesOutplay
This is a poor solution as it doesn't take any other delimiter, therefore not scalable and not maintable.Baresark
To people asking how this works: equivalent code using less of the STL would look like string token; istringstream iss(sentence); while (iss >> token) { cout << token; } or { tokens.push_back(token); }Licence
Why do I get "error C2664: 'std::back_inserter' : cannot convert parameter 1 from 'std::vector<_Ty> (__cdecl *)(void)' to 'std::vector<_Ty> &'" in VS2008?Pustulate
The template argument to back_inserter should be string, not vector<string>. That is, it should be back_inserter<string>(tokens), not back_inserter<vector<string>>(tokens).Bailiff
Take a look at ranges if you care about elegance in practical terms (i.e. do more with less code): slideshare.net/rawwell/iteratorsmustgoSweatbox
Actually, this can work just fine with other delimiters (though doing some is somewhat ugly). You create a ctype facet that classifies the desired delimiters as whitespace, create a locale containing that facet, then imbue the stringstream with that locale before extracting strings.Self
The main purpose of istream_iterator is it can parse int, float, double, etc from an istream: istream_iterator<double> does a decent job reading doubles separated by space. With a front or especially back inserter it's a great combo! :)Koffler
vector has a ctor that takes a begin and end iterator, so no need for the copy call to insert them into a container.Miner
@Kinderchocolate "The string can be assumed to be composed of words separated by whitespace" - Hmm, doesn't sound like a poor solution to the question's problem. "not scalable and not maintable" - Hah, nice one.Crossness
@Nawaz Why should it? You're inserting into a std::vector<std::string> and not into a std::string. But then again, there shouldn't be an explicit template argument, anyway (well, there shouldn't even be a back_inserter or copy, but ok).Crossness
@ChristianRau: Oh you're right; the first code-snippet probably confused me. Actually I should have said you don't need to mention the template argument in std::back_inserter; in fact, mentioning template argument defies the very purpose of back_inserter.Bailiff
why do you need to use curly brackets in vector<string> tokens{istream_iterator<string>{iss}, istream_iterator<string>{}}; is it because otherwise it looks like function call?Denary
Questions: 1. why would istream_iterator stop at white spaces? For me spaces are also part of the string; 2. why is it very inefficient?Stallion
The elegance in needing 5 includes, 3 lines (not counting using <namespace> and quite cryptic code to... split a string? dear god.Hhd
We could also have used STL to split a string.Hernandes
This is much faster than Evan Teran's answer if you only need to split on whitespace.Auden
While the missing delimiter concern is correct one should take into account that the OPs solution couldn't handle that either. So this seems to be not a requirement.Erubescent
@doorfly The only place where curly brackets are needed is istream_iterator<string>{}, because that would otherwise be regarded as a function.Negrito
If using wstring and your code breaks, check this answer for fixing the istream_iterator usage with wchar_t: https://mcmap.net/q/40727/-c-using-istream_iterator-with-wstringstreamNereus
@Interviewee Yes. You can add a specialized local to the stream that makes a , a space (and all other characters not a space). Then the code will work just the same. codereview.stackexchange.com/a/57467/507Harmonic
This code could really use some comments to explain what the purpose of every item is. A typical person asking this question is only going to end up with more questions after reading this, e.g. what the purpose of the empty istream_iterator is, or why the "create the vector directly" solution has so many brackets.Idona
I don't think there is any power or elegance in this, compared to just std::string::split(). Of course there is not such split in STLSmog
You can set the delimiter of istringstream https://mcmap.net/q/40728/-how-to-extract-mixed-format-using-istringstreamHoax
L
874

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

See the documentation for details.

Landa answered 25/10, 2008 at 8:58 Comment(18)
Speed is irrelevant here, as both of these cases are much slower than a strtok-like function.Stain
This is practical and quick enough if you know the line will contain just a few tokens, but if it contains many then you will burn a ton of memory (and time) growing the vector. So no, it's not faster than the stringstream solution -- at least not for large n, which is the only case where speed matters.Gusgusba
And for those who don't already have boost... bcp copies over 1,000 files for this :)Hampden
Warning, when given an empty string (""), this method return a vector containing the "" string. So add an "if (!string_to_split.empty())" before the split.Philanthropy
@Ian Embedded developers aren't all using boost.As
@As are embedded developers using C++ anyway?Chaucerian
bcp'ing this brings forth libraries such as the MPL, which I think is really hardly needed to split text. Man it is a PITA...Liverish
@j_random_hacker: "at least not for large n, which is the only case where speed matters" - also for smallish n in a large-n loop...Stockwell
@tuxSlayer: various POSIX/XOPEN/UNIX standards also specify strtok_rStockwell
@TonyDelroy: Yeah, and it looks like in msvc it is called strtok_s (meaning safe?:)). Not too portable...Sportswear
@tuxSlayer: if you'd prefer to write your own implementation instead of have a five line #if/#else/#endif then knock yourself out....Stockwell
Use std::string::find(..) and std::string::substr(..) no need to use boost.Mcqueen
actually in our company we are not allowed to use boost due to security, yeah i know but suits have decided.Badalona
as an addendum: I use boost only when I must, normally I prefer to add to my own library of code which is standalone and portable so that I can achieve small precise specific code, which accomplishes a given aim. That way the code is non-public, performant, trivial and portable. Boost has its place but I would suggest that its a bit of overkill for tokenising strings: you wouldnt have your whole house transported to an engineering firm to get a new nail hammered into the wall to hang a picture.... they may do it extremely well, but the prosare by far outweighed by the cons.Griceldagrid
nice it even works for calling of boost framework in xcode (iOS project) in cpp classMulholland
My personal opinion is that C and C++ are languages not meant to be agile or to provide fast to market solutions, using Boost is almost the same as choosing an higher level language that offer more abstraction, for those we choose Java, C#, etc... Because for those we don't care for exactly what it's doing beneath the hood. Using Boost would also mean that I would have to tell my client that I'm including a third party library. Thanks anyway. :)Maccabees
Can the boost::split really work on the utf-8 string? Can you share any documentation for that? I am trying to split a utf-8 string at newlines. Will the boost::split work correctly if the string that I pass is using utf-8 encoding?Alcohol
@Andrew: any_of has been part of the standard library since 2011: en.cppreference.com/w/cpp/algorithm/all_any_none_ofCardcarrying
C
413
#include <vector>
#include <string>
#include <sstream>

int main()
{
    std::string str("Split me by whitespaces");
    std::string buf;                 // Have a buffer string
    std::stringstream ss(str);       // Insert the string into a stream

    std::vector<std::string> tokens; // Create vector to hold our words

    while (ss >> buf)
        tokens.push_back(buf);

    return 0;
}
Colicroot answered 25/10, 2008 at 8:58 Comment(2)
You can also split on other delimiters if you use getline in the while condition e.g. to split by commas, use while(getline(ss, buff, ',')).Staub
I don't understand how this got 400 upvotes. This is basically the same as in OQ: use a stringstream and >> from it. Exactly what OP did even in revision 1 of the question history.Stafford
G
201

An efficient, small, and elegant solution using a template function:

template <class ContainerT>
void split(const std::string& str, ContainerT& tokens,
           const std::string& delimiters = " ", bool trimEmpty = false)
{
   std::string::size_type pos, lastPos = 0, length = str.length();
   
   using value_type = typename ContainerT::value_type;
   using size_type = typename ContainerT::size_type;
   
   while (lastPos < length + 1)
   {
      pos = str.find_first_of(delimiters, lastPos);
      if (pos == std::string::npos)
         pos = length;

      if (pos != lastPos || !trimEmpty)
         tokens.emplace_back(value_type(str.data() + lastPos,
               (size_type)pos - lastPos));

      lastPos = pos + 1;
   }
}

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)... but list<...> may sometimes be preferred over vector<...>.

It also allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it requires is std::string included via <string>. It does not use streams or the boost library explicitly but will be able to accept some of these types.

Also since C++-17 you can use std::vector<std::string_view> which is much faster and more memory-efficient than using std::string. Here is a revised version which also supports the container as a return type:

#include <vector>
#include <string_view>
#include <utility>
    
template < typename StringT,
           typename DelimiterT = char,
           typename ContainerT = std::vector<std::string_view> >
ContainerT split(StringT const& str, DelimiterT const& delimiters = ' ', bool trimEmpty = true, ContainerT&& tokens = {})
{
    typename StringT::size_type pos, lastPos = 0, length = str.length();

    while (lastPos < length + 1)
    {
        pos = str.find_first_of(delimiters, lastPos);
        if (pos == StringT::npos)
            pos = length;

      if (pos != lastPos || !trimEmpty)
            tokens.emplace_back(str.data() + lastPos, pos - lastPos);

        lastPos = pos + 1;
    }

    return std::forward<ContainerT>(tokens);
}

Care has been taken not to make any unneeded copies.

This will allow for either:

for (auto const& line : split(str, '\n'))

Or:

auto& lines = split(str, '\n');

Both returning the default template container type of std::vector<std::string_view>.

To get a specific container type back, or to pass an existing container, use the tokens input parameter with either a typed initial container or an existing container variable:

auto& lines = split(str, '\n', false, std::vector<std::string>());

Or:

std::vector<std::string> lines;
split(str, '\n', false, lines);
Grayson answered 25/10, 2008 at 8:58 Comment(8)
I'm quite a fan of this, but for g++ (and probably good practice) anyone using this will want typedefs and typenames: typedef ContainerT Base; typedef typename Base::value_type ValueType; typedef typename ValueType::size_type SizeType; Then to substitute out the value_type and size_types accordingly.Moitoso
For those of us for whom the template stuff and the first comment are completely foreign, a usage example cmplete with required includes would be lovely.Bedside
Ahh well, I figured it out. I put the C++ lines from aws' comment inside the function body of tokenize(), then edited the tokens.push_back() lines to change the ContainerT::value_type to just ValueType and changed (ContainerT::value_type::size_type) to (SizeType). Fixed the bits g++ had been whining about. Just invoke it as tokenize( some_string, some_vector );Bedside
Apart from running a few performance tests on sample data, primarily I've reduced it to as few as possible instructions and also as little as possible memory copies enabled by the use of a substring class that only references offsets/lengths in other strings. (I rolled my own, but there are some other implementations). Unfortunately there is not too much else one can do to improve on this, but incremental increases were possible.Grayson
That's the correct output for when trimEmpty = true. Keep in mind that "abo" is not a delimiter in this answer, but the list of delimiter characters. It would be simple to modify it to take a single delimiter string of characters (I think str.find_first_of should change to str.find_first, but I could be wrong... can't test)Grayson
I had some issues initially, but this does in fact work with wstring / unicode if you update the template accordingly. Be careful though; i ran into some easy to cause runtime errors that the compiler didn't catch in a couple different places.Nereus
Your code is not working! Try string = "hih1ihi", substring = "hi". Your code is not giving the correct result. minus.Sedation
@Sedation I think you assumed the delimiters parameter is not a character list of delimiters but rather a substring. Therein lies the rub.Grayson
F
178

Here's another solution. It's compact and reasonably efficient:

std::vector<std::string> split(const std::string &text, char sep) {
  std::vector<std::string> tokens;
  std::size_t start = 0, end = 0;
  while ((end = text.find(sep, start)) != std::string::npos) {
    tokens.push_back(text.substr(start, end - start));
    start = end + 1;
  }
  tokens.push_back(text.substr(start));
  return tokens;
}

It can easily be templatised to handle string separators, wide strings, etc.

Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {
    std::vector<std::string> tokens;
    std::size_t start = 0, end = 0;
    while ((end = text.find(sep, start)) != std::string::npos) {
        if (end != start) {
          tokens.push_back(text.substr(start, end - start));
        }
        start = end + 1;
    }
    if (end != start) {
       tokens.push_back(text.substr(start));
    }
    return tokens;
}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:

std::vector<std::string> split(const std::string& text, const std::string& delims)
{
    std::vector<std::string> tokens;
    std::size_t start = text.find_first_not_of(delims), end = 0;

    while((end = text.find_first_of(delims, start)) != std::string::npos)
    {
        tokens.push_back(text.substr(start, end - start));
        start = text.find_first_not_of(delims, end);
    }
    if(start != std::string::npos)
        tokens.push_back(text.substr(start));

    return tokens;
}
Fiann answered 25/10, 2008 at 8:58 Comment(11)
The first version is simple and gets the job done perfectly. The only change I would made would be to return the result directly, instead of passing it as a parameter.Dismantle
The output is passed as a parameter for efficiency. If the result were returned it would require either a copy of the vector, or a heap allocation which would then have to be freed.Fiann
A slight addendum to my comment above: this function could return the vector without penalty if using C++11 move semantics.Fiann
@AlecThomas: Even before C++11, wouldn't most compilers optimise away the return copy via NRVO? (+1 anyway; very succinct)Calcic
@Peter M I would rather have it be passed in by reference, just in case the vector<string> got large.Amaral
@Veritas In what way does it not work if the delimiter is the last character? Also, outputting empty tokens is intentional, though it could obviously be easily modified to not do that if required.Fiann
Out of all the answers this appears to be one of the most appealing and flexible. Together with the getline with a delimiter, although its a less obvious solution. Does the c++11 standard not have anything for this? Does c++11 support punch cards these days?Dashiell
If you pass in an empty string, it returns a vector with 1 element (empty string). If you pass in a string that's the same as sep, then it returns a vector with 2 elements (both empty strings). Should have "if (end > 0) {" before the push_back in while loop and "if (start > 0) {" before push_back below while loop to fix this.Spirituality
@LearnCocos2D Please don't alter the meaning of a post with an edit. This behaviour is by design. It is identical behaviour to Python's split operator. I'll add a note to make this clear.Fiann
Suggest using std::string::size_type instead of int, as some compilers might spit out signed/unsigned warnings otherwise.Desexualize
the first function in this answer is the best solution - works perfectly with a reverse join function - std::string strJoin(const std::vector<std::string> v, const char& delimiter) { if(!v.empty()) { std::stringstream ss; std::string str(1, delimiter); auto it = v.cbegin(); while(true) { ss << *it++; if(it != v.cend()) ss << delimiter; else return ss.str(); } } return ""; }Abatis
W
143

This is my favorite way to iterate through a string. You can do whatever you want per word.

string line = "a line of text to iterate through";
string word;

istringstream iss(line, istringstream::in);

while( iss >> word )     
{
    // Do something on `word` here...
}
With answered 25/10, 2008 at 8:58 Comment(4)
Is it possible to declare word as a char?Follmer
Sorry abatishchev, C++ is not my strong point. But I imagine it would not be difficult to add an inner loop to loop through every character in each word. But right now I believe the current loop depends on spaces for word separation. Unless you know that there is only a single character between every space, in which case you can just cast "word" to a char... sorry I cant be of more help, ive been meaning to brush up on my C++With
if you declare word as a char it will iterate over every non-whitespace character. It's simple enough to try: stringstream ss("Hello World, this is*@#&$(@ a string"); char c; while(ss >> c) cout << c;Curler
I don't understand how this got 140 upvotes. This is basically the same as in OQ: use a stringstream and >> from it. Exactly what OP did even in revision 1 of the question history.Stafford
W
89

This is similar to Stack Overflow question How do I tokenize a string in C++?. Requires Boost external library

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost;

int main(int argc, char** argv)
{
    string text = "token  test\tstring";

    char_separator<char> sep(" \t");
    tokenizer<char_separator<char>> tokens(text, sep);
    for (const string& t : tokens)
    {
        cout << t << "." << endl;
    }
}
Wie answered 25/10, 2008 at 8:58 Comment(1)
Does this materialize a copy of all of the tokens, or does it only keep the start and end position of the current token?Aerology
I
71

I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.

#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;

vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
    vector<string> result;
    if (delim.empty()) {
        result.push_back(s);
        return result;
    }
    string::const_iterator substart = s.begin(), subend;
    while (true) {
        subend = search(substart, s.end(), delim.begin(), delim.end());
        string temp(substart, subend);
        if (keep_empty || !temp.empty()) {
            result.push_back(temp);
        }
        if (subend == s.end()) {
            break;
        }
        substart = subend + delim.size();
    }
    return result;
}

int main() {
    const vector<string> words = split("So close no matter how far", " ");
    copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}

Of course, Boost has a split() that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of() works great.

Incest answered 25/10, 2008 at 8:58 Comment(1)
Finally a solution that is handling empty tokens correctly at both sides of the stringMolliemollify
F
59

The STL does not have such a method available already.

However, you can either use C's strtok() function by using the std::string::c_str() member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):

void Tokenize(const string& str,
              vector<string>& tokens,
              const string& delimiters = " ")
{
    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
        // Found a token, add it to the vector.
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        // Skip delimiters.  Note the "not_of"
        lastPos = str.find_first_not_of(delimiters, pos);
        // Find next "non-delimiter"
        pos = str.find_first_of(delimiters, lastPos);
    }
}

Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html

If you have questions about the code sample, leave a comment and I will explain.

And just because it does not implement a typedef called iterator or overload the << operator does not mean it is bad code. I use C functions quite frequently. For example, printf and scanf both are faster than std::cin and std::cout (significantly), the fopen syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.

Don't get sold on this "Elegance over performance" deal.

Frodine answered 25/10, 2008 at 8:58 Comment(11)
I'm aware of the C string functions and I'm aware of the performance issues too (both of which I've noted in my question). However, for this specific question, I'm looking for an elegant C++ solution.Apocopate
... and you dont want to just build a OO wrapper over the C functions why?Creamy
@Nelson LaQuet: Let me guess: Because strtok is not reentrant?Uprise
Why not use the C++ features that are meant for this job?Cardcarrying
@Nelson don't ever pass string.c_str() to strtok! strtok trashes the input string (inserts '\0' chars to replace each foudn delimiter) and c_str() returns a non-modifiable string.Smolensk
char* ch = new char[str.size()]; strcpy(ch, str.c_str()); ... delete[] ch; // problem solved.Creamy
@Nelson: That array needs to be of size str.size() + 1 in your last comment. But I agree with your thesis that it's silly to avoid C functions for "aesthetic" reasons.Gusgusba
"For example, printf and scanf both are faster then cin and cout" only because synchronization is enabled by defaultEmmenagogue
"The STL does not have such a method available already" - what's wrong with string's find_first_of and using iterators to remember positions? Then, use substr to extract.Gagman
@paulm: No, the slowness of C++ streams is caused by facets. They're still slower than stdio.h functions even when synchronization is disabled (and on stringstreams, which can't synchronize).Fructose
Or you could use strsep() (though not as portable). If you don't care about more than one char as the delimiter another answer gives an idea (getdelim()) but you could also iterate over the string with strchr(). Or...there are many ways depending on what you are after and need.Millner
W
45

Here is a split function that:

  • is generic
  • uses standard C++ (no boost)
  • accepts multiple delimiters
  • ignores empty tokens (can easily be changed)

    template<typename T>
    vector<T> 
    split(const T & str, const T & delimiters) {
        vector<T> v;
        typename T::size_type start = 0;
        auto pos = str.find_first_of(delimiters, start);
        while(pos != T::npos) {
            if(pos != start) // ignore empty tokens
                v.emplace_back(str, start, pos - start);
            start = pos + 1;
            pos = str.find_first_of(delimiters, start);
        }
        if(start < str.length()) // ignore trailing delimiter
            v.emplace_back(str, start, str.length() - start); // add what's left of the string
        return v;
    }
    

Example usage:

    vector<string> v = split<string>("Hello, there; World", ";,");
    vector<wstring> v = split<wstring>(L"Hello, there; World", L";,");
Windup answered 25/10, 2008 at 8:58 Comment(7)
You forgot to add to use list: "extremely inefficient"Voguish
@XanderTulip, can you be more constructive and explain how or why?Windup
@XanderTulip: I assume you are referring to it returning the vector by value. The Return-Value-Optimization (RVO, google it) should take care of this. Also in C++11 you could return by move reference.Sniffy
This can actually be optimized further: instead of .push_back(str.substr(...)) one can use .emplace_back(str, start, pos - start). This way the string object is constructed in the container and thus we avoid a move operation + other shenanigans done by the .substr function.Skeens
@zoopp yes. Good idea. VS10 didn't have emplace_back support when I wrote this. I will update my answer. ThanksWindup
Can someone please make it return up to a max N elements? Any remaining characters should end up in the last element.Bulgar
Anyone else getting the error "missing 'typename' prior to dependent type name 'T::size_type'"?Pacer
D
39

I have a 2 lines solution to this problem:

char sep = ' ';
std::string s="1 This is an example";

for(size_t p=0, q=0; p!=s.npos; p=q)
  std::cout << s.substr(p+(p!=0), (q=s.find(sep, p+1))-p-(p!=0)) << std::endl;

Then instead of printing you can put it in a vector.

Danella answered 25/10, 2008 at 8:58 Comment(2)
it's only a two-liner because one of those two lines is huge and cryptic... no one who actually has to read code ever, wants to read something like this, or would write it. contrived brevity is worse than tasteful verbosity.Pareira
You can even make it a one liner by putting everything on a single line! Isn't that wonderful?Danella
S
37

Here's a simple solution that uses only the standard regex library

#include <regex>
#include <string>
#include <vector>

std::vector<string> Tokenize( const string str, const std::regex regex )
{
    using namespace std;

    std::vector<string> result;

    sregex_token_iterator it( str.begin(), str.end(), regex, -1 );
    sregex_token_iterator reg_end;

    for ( ; it != reg_end; ++it ) {
        if ( !it->str().empty() ) //token could be empty:check
            result.emplace_back( it->str() );
    }

    return result;
}

The regex argument allows checking for multiple arguments (spaces, commas, etc.)

I usually only check to split on spaces and commas, so I also have this default function:

std::vector<string> TokenizeDefault( const string str )
{
    using namespace std;

    regex re( "[\\s,]+" );

    return Tokenize( str, re );
}

The "[\\s,]+" checks for spaces (\\s) and commas (,).

Note, if you want to split wstring instead of string,

  • change all std::regex to std::wregex
  • change all sregex_token_iterator to wsregex_token_iterator

Note, you might also want to take the string argument by reference, depending on your compiler.

Sheetfed answered 25/10, 2008 at 8:58 Comment(3)
This would have been my favourite answer, but std::regex is broken in GCC 4.8. They said that they implemented it correctly in GCC 4.9. I am still giving you my +1Zecchino
This is my favorite with minor changes: vector returned as reference as you said, and the arguments "str" and "regex" passed by references also. thx.Sexagenarian
Raw strings are pretty useful while dealing with regex patterns. That way, you don't have to use the escape sequences... You can just use R"([\s,]+)".Manteltree
W
37

Yet another flexible and fast way

template<typename Operator>
void tokenize(Operator& op, const char* input, const char* delimiters) {
  const char* s = input;
  const char* e = s;
  while (*e != 0) {
    e = s;
    while (*e != 0 && strchr(delimiters, *e) == 0) ++e;
    if (e - s > 0) {
      op(s, e - s);
    }
    s = e + 1;
  }
}

To use it with a vector of strings (Edit: Since someone pointed out not to inherit STL classes... hrmf ;) ) :

template<class ContainerType>
class Appender {
public:
  Appender(ContainerType& container) : container_(container) {;}
  void operator() (const char* s, unsigned length) { 
    container_.push_back(std::string(s,length));
  }
private:
  ContainerType& container_;
};

std::vector<std::string> strVector;
Appender v(strVector);
tokenize(v, "A number of words to be tokenized", " \t");

That's it! And that's just one way to use the tokenizer, like how to just count words:

class WordCounter {
public:
  WordCounter() : noOfWords(0) {}
  void operator() (const char*, unsigned) {
    ++noOfWords;
  }
  unsigned noOfWords;
};

WordCounter wc;
tokenize(wc, "A number of words to be counted", " \t"); 
ASSERT( wc.noOfWords == 7 );

Limited by imagination ;)

Weatherboarding answered 25/10, 2008 at 8:58 Comment(1)
Nice. Regarding Appender note "Why shouldn't we inherit a class from STL classes?"Amaryl
S
33

Using std::stringstream as you have works perfectly fine, and do exactly what you wanted. If you're just looking for different way of doing things though, you can use std::find()/std::find_first_of() and std::string::substr().

Here's an example:

#include <iostream>
#include <string>

int main()
{
    std::string s("Somewhere down the road");
    std::string::size_type prev_pos = 0, pos = 0;

    while( (pos = s.find(' ', pos)) != std::string::npos )
    {
        std::string substring( s.substr(prev_pos, pos-prev_pos) );

        std::cout << substring << '\n';

        prev_pos = ++pos;
    }

    std::string substring( s.substr(prev_pos, pos-prev_pos) ); // Last word
    std::cout << substring << '\n';

    return 0;
}
Sarson answered 25/10, 2008 at 8:58 Comment(1)
This only works for single character delimiters. A simple change lets it work with multicharacter: prev_pos = pos += delimiter.length();Macnamara
J
26

If you like to use boost, but want to use a whole string as delimiter (instead of single characters as in most of the previously proposed solutions), you can use the boost_split_iterator.

Example code including convenient template:

#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>

template<typename _OutputIterator>
inline void split(
    const std::string& str, 
    const std::string& delim, 
    _OutputIterator result)
{
    using namespace boost::algorithm;
    typedef split_iterator<std::string::const_iterator> It;

    for(It iter=make_split_iterator(str, first_finder(delim, is_equal()));
            iter!=It();
            ++iter)
    {
        *(result++) = boost::copy_range<std::string>(*iter);
    }
}

int main(int argc, char* argv[])
{
    using namespace std;

    vector<string> splitted;
    split("HelloFOOworldFOO!", "FOO", back_inserter(splitted));

    // or directly to console, for example
    split("HelloFOOworldFOO!", "FOO", ostream_iterator<string>(cout, "\n"));
    return 0;
}
Justness answered 25/10, 2008 at 8:58 Comment(0)
P
23

C++20 finally blesses us with a split function. Or rather, a range adapter. Godbolt link.

#include <iostream>
#include <ranges>
#include <string_view>

namespace ranges = std::ranges;
namespace views = std::views;

using str = std::string_view;

auto view =
    "Multiple words"
    | views::split(' ')
    | views::transform([](auto &&r) -> str {
        return str(r.begin(), r.end());
    });

auto main() -> int {
    for (str &&sv : view) {
        std::cout << sv << '\n';
    }
}
Pignus answered 25/10, 2008 at 8:58 Comment(1)
This is mostly the same as https://mcmap.net/q/40394/-how-do-i-iterate-over-the-words-of-a-string.Inefficacious
S
23

Heres a regex solution that only uses the standard regex library. (I'm a little rusty, so there may be a few syntax errors, but this is at least the general idea)

#include <regex.h>
#include <string.h>
#include <vector.h>

using namespace std;

vector<string> split(string s){
    regex r ("\\w+"); //regex matches whole words, (greedy, so no fragment words)
    regex_iterator<string::iterator> rit ( s.begin(), s.end(), r );
    regex_iterator<string::iterator> rend; //iterators to iterate thru words
    vector<string> result<regex_iterator>(rit, rend);
    return result;  //iterates through the matches to fill the vector
}
Stinger answered 25/10, 2008 at 8:58 Comment(1)
Similar responses with maybe better regex approach: here, and here.Reptile
O
22

There is a function named strtok.

#include<string>
using namespace std;

vector<string> split(char* str,const char* delim)
{
    char* saveptr;
    char* token = strtok_r(str,delim,&saveptr);

    vector<string> result;

    while(token != NULL)
    {
        result.push_back(token);
        token = strtok_r(NULL,delim,&saveptr);
    }
    return result;
}
Ought answered 25/10, 2008 at 8:58 Comment(7)
strtok is from the C standard library, not C++. It is not safe to use in multithreaded programs. It modifies the input string.Baccalaureate
Because it stores the char pointer from the first call in a static variable, so that on the subsequent calls when NULL is passed, it remembers what pointer should be used. If a second thread calls strtok when another thread is still processing, this char pointer will be overwritten, and both threads will then have incorrect results. mkssoftware.com/docs/man3/strtok.3.aspBaccalaureate
as mentioned before strtok is unsafe and even in C strtok_r is recommended for useMarinate
strtok_r can be used if you are in a section of code that may be accessed. this is the only solution of all of the above that isn't "line noise", and is a testament to what, exactly, is wrong with c++Holm
Updated so there can be no objections on the grounds of thread safety from C++ wonks.Holm
strtok is evil. It treats two delimiters as a single delimiter if there is nothing between them.Erythro
A for() loop looks better. Like this davekb.com/browse_programming_tips:strtok_r_example:txtShermanshermie
S
20

Using std::string_view and Eric Niebler's range-v3 library:

https://wandbox.org/permlink/kW5lwRCL1pxjp2pW

#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"
#include "range/v3/algorithm.hpp"

int main() {
    std::string s = "Somewhere down the range v3 library";
    ranges::for_each(s  
        |   ranges::view::split(' ')
        |   ranges::view::transform([](auto &&sub) {
                return std::string_view(&*sub.begin(), ranges::distance(sub));
            }),
        [](auto s) {std::cout << "Substring: " << s << "\n";}
    );
}

By using a range for loop instead of ranges::for_each algorithm:

#include <iostream>
#include <string>
#include <string_view>
#include "range/v3/view.hpp"

int main()
{
    std::string str = "Somewhere down the range v3 library";
    for (auto s : str | ranges::view::split(' ')
                      | ranges::view::transform([](auto&& sub) { return std::string_view(&*sub.begin(), ranges::distance(sub)); }
                      ))
    {
        std::cout << "Substring: " << s << "\n";
    }
}
Schechinger answered 25/10, 2008 at 8:58 Comment(1)
Yepp, the range for based looks better - I agreeSchechinger
V
19

The stringstream can be convenient if you need to parse the string by non-space symbols:

string s = "Name:JAck; Spouse:Susan; ...";
string dummy, name, spouse;

istringstream iss(s);
getline(iss, dummy, ':');
getline(iss, name, ';');
getline(iss, dummy, ':');
getline(iss, spouse, ';')
Vivavivace answered 25/10, 2008 at 8:58 Comment(0)
A
15

Short and elegant

#include <vector>
#include <string>
using namespace std;

vector<string> split(string data, string token)
{
    vector<string> output;
    size_t pos = string::npos; // size_t to avoid improbable overflow
    do
    {
        pos = data.find(token);
        output.push_back(data.substr(0, pos));
        if (string::npos != pos)
            data = data.substr(pos + token.size());
    } while (string::npos != pos);
    return output;
}

can use any string as delimiter, also can be used with binary data (std::string supports binary data, including nulls)

using:

auto a = split("this!!is!!!example!string", "!!");

output:

this
is
!example!string
Anticathode answered 25/10, 2008 at 8:58 Comment(1)
I like this solution because it allows the separator to be a string and not a char, however, it is modifying in place the string, so it is forcing the creation of a copy of the original string.Rawlings
P
15

So far I used the one in Boost, but I needed something that doesn't depends on it, so I came to this:

static void Split(std::vector<std::string>& lst, const std::string& input, const std::string& separators, bool remove_empty = true)
{
    std::ostringstream word;
    for (size_t n = 0; n < input.size(); ++n)
    {
        if (std::string::npos == separators.find(input[n]))
            word << input[n];
        else
        {
            if (!word.str().empty() || !remove_empty)
                lst.push_back(word.str());
            word.str("");
        }
    }
    if (!word.str().empty() || !remove_empty)
        lst.push_back(word.str());
}

A good point is that in separators you can pass more than one character.

Poppy answered 25/10, 2008 at 8:58 Comment(0)
E
14

I've rolled my own using strtok and used boost to split a string. The best method I have found is the C++ String Toolkit Library. It is incredibly flexible and fast.

#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp>

const char *whitespace  = " \t\r\n\f";
const char *whitespace_and_punctuation  = " \t\r\n\f;,=";

int main()
{
    {   // normal parsing of a string into a vector of strings
        std::string s("Somewhere down the road");
        std::vector<std::string> result;
        if( strtk::parse( s, whitespace, result ) )
        {
            for(size_t i = 0; i < result.size(); ++i )
                std::cout << result[i] << std::endl;
        }
    }

    {  // parsing a string into a vector of floats with other separators
        // besides spaces

        std::string s("3.0, 3.14; 4.0");
        std::vector<float> values;
        if( strtk::parse( s, whitespace_and_punctuation, values ) )
        {
            for(size_t i = 0; i < values.size(); ++i )
                std::cout << values[i] << std::endl;
        }
    }

    {  // parsing a string into specific variables

        std::string s("angle = 45; radius = 9.9");
        std::string w1, w2;
        float v1, v2;
        if( strtk::parse( s, whitespace_and_punctuation, w1, v1, w2, v2) )
        {
            std::cout << "word " << w1 << ", value " << v1 << std::endl;
            std::cout << "word " << w2 << ", value " << v2 << std::endl;
        }
    }

    return 0;
}

The toolkit has much more flexibility than this simple example shows but its utility in parsing a string into useful elements is incredible.

Entangle answered 25/10, 2008 at 8:58 Comment(0)
O
13

This answer takes the string and puts it into a vector of strings. It uses the boost library.

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
Overeat answered 25/10, 2008 at 8:58 Comment(0)
B
13

I made this because I needed an easy way to split strings and C-based strings. Hopefully someone else can find it useful as well. Also, it doesn't rely on tokens, and you can use fields as delimiters, which is another key I needed.

I'm sure there are improvements that can be made to even further improve its elegance, and please do by all means.

StringSplitter.hpp:

#include <vector>
#include <iostream>
#include <string.h>

using namespace std;

class StringSplit
{
private:
    void copy_fragment(char*, char*, char*);
    void copy_fragment(char*, char*, char);
    bool match_fragment(char*, char*, int);
    int untilnextdelim(char*, char);
    int untilnextdelim(char*, char*);
    void assimilate(char*, char);
    void assimilate(char*, char*);
    bool string_contains(char*, char*);
    long calc_string_size(char*);
    void copy_string(char*, char*);

public:
    vector<char*> split_cstr(char);
    vector<char*> split_cstr(char*);
    vector<string> split_string(char);
    vector<string> split_string(char*);
    char* String;
    bool do_string;
    bool keep_empty;
    vector<char*> Container;
    vector<string> ContainerS;

    StringSplit(char * in)
    {
        String = in;
    }

    StringSplit(string in)
    {
        size_t len = calc_string_size((char*)in.c_str());
        String = new char[len + 1];
        memset(String, 0, len + 1);
        copy_string(String, (char*)in.c_str());
        do_string = true;
    }

    ~StringSplit()
    {
        for (int i = 0; i < Container.size(); i++)
        {
            if (Container[i] != NULL)
            {
                delete[] Container[i];
            }
        }
        if (do_string)
        {
            delete[] String;
        }
    }
};

StringSplitter.cpp:

#include <string.h>
#include <iostream>
#include <vector>
#include "StringSplit.hpp"

using namespace std;

void StringSplit::assimilate(char*src, char delim)
{
    int until = untilnextdelim(src, delim);
    if (until > 0)
    {
        char * temp = new char[until + 1];
        memset(temp, 0, until + 1);
        copy_fragment(temp, src, delim);
        if (keep_empty || *temp != 0)
        {
            if (!do_string)
            {
                Container.push_back(temp);
            }
            else
            {
                string x = temp;
                ContainerS.push_back(x);
            }

        }
        else
        {
            delete[] temp;
        }
    }
}

void StringSplit::assimilate(char*src, char* delim)
{
    int until = untilnextdelim(src, delim);
    if (until > 0)
    {
        char * temp = new char[until + 1];
        memset(temp, 0, until + 1);
        copy_fragment(temp, src, delim);
        if (keep_empty || *temp != 0)
        {
            if (!do_string)
            {
                Container.push_back(temp);
            }
            else
            {
                string x = temp;
                ContainerS.push_back(x);
            }
        }
        else
        {
            delete[] temp;
        }
    }
}

long StringSplit::calc_string_size(char* _in)
{
    long i = 0;
    while (*_in++)
    {
        i++;
    }
    return i;
}

bool StringSplit::string_contains(char* haystack, char* needle)
{
    size_t len = calc_string_size(needle);
    size_t lenh = calc_string_size(haystack);
    while (lenh--)
    {
        if (match_fragment(haystack + lenh, needle, len))
        {
            return true;
        }
    }
    return false;
}

bool StringSplit::match_fragment(char* _src, char* cmp, int len)
{
    while (len--)
    {
        if (*(_src + len) != *(cmp + len))
        {
            return false;
        }
    }
    return true;
}

int StringSplit::untilnextdelim(char* _in, char delim)
{
    size_t len = calc_string_size(_in);
    if (*_in == delim)
    {
        _in += 1;
        return len - 1;
    }

    int c = 0;
    while (*(_in + c) != delim && c < len)
    {
        c++;
    }

    return c;
}

int StringSplit::untilnextdelim(char* _in, char* delim)
{
    int s = calc_string_size(delim);
    int c = 1 + s;

    if (!string_contains(_in, delim))
    {
        return calc_string_size(_in);
    }
    else if (match_fragment(_in, delim, s))
    {
        _in += s;
        return calc_string_size(_in);
    }

    while (!match_fragment(_in + c, delim, s))
    {
        c++;
    }

    return c;
}

void StringSplit::copy_fragment(char* dest, char* src, char delim)
{
    if (*src == delim)
    {
        src++;
    }
        
    int c = 0;
    while (*(src + c) != delim && *(src + c))
    {
        *(dest + c) = *(src + c);
        c++;
    }
    *(dest + c) = 0;
}

void StringSplit::copy_string(char* dest, char* src)
{
    int i = 0;
    while (*(src + i))
    {
        *(dest + i) = *(src + i);
        i++;
    }
}

void StringSplit::copy_fragment(char* dest, char* src, char* delim)
{
    size_t len = calc_string_size(delim);
    size_t lens = calc_string_size(src);
    
    if (match_fragment(src, delim, len))
    {
        src += len;
        lens -= len;
    }
    
    int c = 0;
    while (!match_fragment(src + c, delim, len) && (c < lens))
    {
        *(dest + c) = *(src + c);
        c++;
    }
    *(dest + c) = 0;
}

vector<char*> StringSplit::split_cstr(char Delimiter)
{
    int i = 0;
    while (*String)
    {
        if (*String != Delimiter && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (*String == Delimiter)
        {
            assimilate(String, Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return Container;
}

vector<string> StringSplit::split_string(char Delimiter)
{
    do_string = true;
    
    int i = 0;
    while (*String)
    {
        if (*String != Delimiter && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (*String == Delimiter)
        {
            assimilate(String, Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return ContainerS;
}

vector<char*> StringSplit::split_cstr(char* Delimiter)
{
    int i = 0;
    size_t LenDelim = calc_string_size(Delimiter);

    while(*String)
    {
        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (match_fragment(String, Delimiter, LenDelim))
        {
            assimilate(String,Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return Container;
}

vector<string> StringSplit::split_string(char* Delimiter)
{
    do_string = true;
    int i = 0;
    size_t LenDelim = calc_string_size(Delimiter);

    while (*String)
    {
        if (!match_fragment(String, Delimiter, LenDelim) && i == 0)
        {
            assimilate(String, Delimiter);
        }
        if (match_fragment(String, Delimiter, LenDelim))
        {
            assimilate(String, Delimiter);
        }
        i++;
        String++;
    }

    String -= i;
    delete[] String;

    return ContainerS;
}

Examples:

int main(int argc, char*argv[])
{
    StringSplit ss = "This:CUT:is:CUT:an:CUT:example:CUT:cstring";
    vector<char*> Split = ss.split_cstr(":CUT:");

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

Will output:

This
is
an
example
cstring

int main(int argc, char*argv[])
{
    StringSplit ss = "This:is:an:example:cstring";
    vector<char*> Split = ss.split_cstr(':');

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

int main(int argc, char*argv[])
{
    string mystring = "This[SPLIT]is[SPLIT]an[SPLIT]example[SPLIT]string";
    StringSplit ss = mystring;
    vector<string> Split = ss.split_string("[SPLIT]");

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

int main(int argc, char*argv[])
{
    string mystring = "This|is|an|example|string";
    StringSplit ss = mystring;
    vector<string> Split = ss.split_string('|');

    for (int i = 0; i < Split.size(); i++)
    {
        cout << Split[i] << endl;
    }

    return 0;
}

To keep empty entries (by default empties will be excluded):

StringSplit ss = mystring;
ss.keep_empty = true;
vector<string> Split = ss.split_string(":DELIM:");

The goal was to make it similar to C#'s Split() method where splitting a string is as easy as:

String[] Split = 
    "Hey:cut:what's:cut:your:cut:name?".Split(new[]{":cut:"}, StringSplitOptions.None);

foreach(String X in Split)
{
    Console.Write(X);
}

I hope someone else can find this as useful as I do.

Burdock answered 25/10, 2008 at 8:58 Comment(0)
A
11

I cannot believe how overly complicated most of these answers were. Why didnt someone suggest something as simple as this?

#include <iostream>
#include <sstream>

std::string input = "This is a sentence to read";
std::istringstream ss(input);
std::string token;

while(std::getline(ss, token, ' ')) {
    std::cout << token << endl;
}
Aurist answered 25/10, 2008 at 8:58 Comment(1)
Maybe it is not "optimized" enough (string is copied...) ? But thanks for this answer!Hypogynous
A
11

Recently I had to split a camel-cased word into subwords. There are no delimiters, just upper characters.

#include <string>
#include <list>
#include <locale> // std::isupper

template<class String>
const std::list<String> split_camel_case_string(const String &s)
{
    std::list<String> R;
    String w;

    for (String::const_iterator i = s.begin(); i < s.end(); ++i) {  {
        if (std::isupper(*i)) {
            if (w.length()) {
                R.push_back(w);
                w.clear();
            }
        }
        w += *i;
    }

    if (w.length())
        R.push_back(w);
    return R;
}

For example, this splits "AQueryTrades" into "A", "Query" and "Trades". The function works with narrow and wide strings. Because it respects the current locale it splits "RaumfahrtÜberwachungsVerordnung" into "Raumfahrt", "Überwachungs" and "Verordnung".

Note std::upper should be really passed as function template argument. Then the more generalized from of this function can split at delimiters like ",", ";" or " " too.

Amaryl answered 25/10, 2008 at 8:58 Comment(2)
There have been 2 revs. That's nice. Seems as if my English had to much of a "German". However, the revisionist did not fixed two minor bugs maybe because they were obvious anyway: std::isupper could be passed as argument, not std::upper. Second put a typename before the String::const_iterator.Amaryl
std::isupper is guaranteed to be defined only in <cctype> header (the C++ version of the C <ctype.h> header), so you must include that. This is like relying we can use std::string by using the <iostream> header instead of the <string> header.Tautology
C
11

What about this:

#include <string>
#include <vector>

using namespace std;

vector<string> split(string str, const char delim) {
    vector<string> v;
    string tmp;

    for(string::const_iterator i; i = str.begin(); i <= str.end(); ++i) {
        if(*i != delim && i != str.end()) {
            tmp += *i; 
        } else {
            v.push_back(tmp);
            tmp = ""; 
        }   
    }   

    return v;
}
Chic answered 25/10, 2008 at 8:58 Comment(1)
This is the best answer here, if you only want to split on a single delimiter character. The original question wanted to split on whitespace though, meaning any combination of one or more consecutive spaces or tabs. You have actually answered stackoverflow.com/questions/53849Autodidact
N
11

Here's another way of doing it..

void split_string(string text,vector<string>& words)
{
  int i=0;
  char ch;
  string word;

  while(ch=text[i++])
  {
    if (isspace(ch))
    {
      if (!word.empty())
      {
        words.push_back(word);
      }
      word = "";
    }
    else
    {
      word += ch;
    }
  }
  if (!word.empty())
  {
    words.push_back(word);
  }
}
Noisemaker answered 25/10, 2008 at 8:58 Comment(1)
I believe this could be optimized a bit by using word.clear() instead of word = "". Calling the clear method will empty the string but keep the already allocated buffer, which will be reused upon further concatenations. Right now a new buffer is created for every word, resulting in extra allocations.Stash
G
10

I like to use the boost/regex methods for this task since they provide maximum flexibility for specifying the splitting criteria.

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main() {
    std::string line("A:::line::to:split");
    const boost::regex re(":+"); // one or more colons

    // -1 means find inverse matches aka split
    boost::sregex_token_iterator tokens(line.begin(),line.end(),re,-1);
    boost::sregex_token_iterator end;

    for (; tokens != end; ++tokens)
        std::cout << *tokens << std::endl;
}
Gullah answered 25/10, 2008 at 8:58 Comment(0)
P
9
#include<iostream>
#include<string>
#include<sstream>
#include<vector>
using namespace std;

    vector<string> split(const string &s, char delim) {
        vector<string> elems;
        stringstream ss(s);
        string item;
        while (getline(ss, item, delim)) {
            elems.push_back(item);
        }
        return elems;
    }

int main() {

        vector<string> x = split("thi is an sample test",' ');
        unsigned int i;
        for(i=0;i<x.size();i++)
            cout<<i<<":"<<x[i]<<endl;
        return 0;
}
Positive answered 25/10, 2008 at 8:58 Comment(0)
V
8
#include <iostream>
#include <regex>

using namespace std;

int main() {
   string s = "foo bar  baz";
   regex e("\\s+");
   regex_token_iterator<string::iterator> i(s.begin(), s.end(), e, -1);
   regex_token_iterator<string::iterator> end;
   while (i != end)
      cout << " [" << *i++ << "]";
}

IMO, this is the closest thing to python's re.split(). See cplusplus.com for more information about regex_token_iterator. The -1 (4th argument in regex_token_iterator ctor) is the section of the sequence that is not matched, using the match as separator.

Valarievalda answered 25/10, 2008 at 8:58 Comment(0)
S
8

Get Boost ! : -)

#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
#include <vector>

using namespace std;
using namespace boost;

int main(int argc, char**argv) {
    typedef vector < string > list_type;

    list_type list;
    string line;

    line = "Somewhere down the road";
    split(list, line, is_any_of(" "));

    for(int i = 0; i < list.size(); i++)
    {
        cout << list[i] << endl;
    }

    return 0;
}

This example gives the output -

Somewhere
down
the
road
Solothurn answered 25/10, 2008 at 8:58 Comment(0)
R
8

The code below uses strtok() to split a string into tokens and stores the tokens in a vector.

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>

using namespace std;


char one_line_string[] = "hello hi how are you nice weather we are having ok then bye";
char seps[]   = " ,\t\n";
char *token;



int main()
{
   vector<string> vec_String_Lines;
   token = strtok( one_line_string, seps );

   cout << "Extracting and storing data in a vector..\n\n\n";

   while( token != NULL )
   {
      vec_String_Lines.push_back(token);
      token = strtok( NULL, seps );
   }
     cout << "Displaying end result in vector line storage..\n\n";

    for ( int i = 0; i < vec_String_Lines.size(); ++i)
    cout << vec_String_Lines[i] << "\n";
    cout << "\n\n\n";


return 0;
}
Readability answered 25/10, 2008 at 8:58 Comment(0)
E
7

Although there was some answer providing C++20 solution, since it was posted there were some changes made and applied to C++20 as Defect Reports. Because of that the solution is a little bit shorter and nicer:

#include <iostream>
#include <ranges>
#include <string_view>

namespace views = std::views;
using str = std::string_view;

constexpr str text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

auto splitByWords(str input) {
    return input
    | views::split(' ')
    | views::transform([](auto &&r) -> str {
        return {r.begin(), r.end()};
    });
}

auto main() -> int {
    for (str &&word : splitByWords(text)) {
        std::cout << word << '\n';
    }
}

As of today it is still available only on the trunk branch of GCC (Godbolt link). It is based on two changes: P1391 iterator constructor for std::string_view and P2210 DR fixing std::views::split to preserve range type.

In C++23 there won't be any transform boilerplate needed, since P1989 adds a range constructor to std::string_view:

#include <iostream>
#include <ranges>
#include <string_view>

namespace views = std::views;

constexpr std::string_view text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

auto main() -> int {
    for (std::string_view&& word : text | views::split(' ')) {
        std::cout << word << '\n';
    }
}

(Godbolt link)

Edge answered 25/10, 2008 at 8:58 Comment(0)
S
7

I use this simpleton because we got our String class "special" (i.e. not standard):

void splitString(const String &s, const String &delim, std::vector<String> &result) {
    const int l = delim.length();
    int f = 0;
    int i = s.indexOf(delim,f);
    while (i>=0) {
        String token( i-f > 0 ? s.substring(f,i-f) : "");
        result.push_back(token);
        f=i+l;
        i = s.indexOf(delim,f);
    }
    String token = s.substring(f);
    result.push_back(token);
}
Stadia answered 25/10, 2008 at 8:58 Comment(0)
T
6

Everyone answered for predefined string input. I think this answer will help someone for scanned input.

I used tokens vector for holding string tokens. It's optional.

#include <bits/stdc++.h>

using namespace std ;
int main()
{
    string str, token ;
    getline(cin, str) ; // get the string as input
    istringstream ss(str); // insert the string into tokenizer

    vector<string> tokens; // vector tokens holds the tokens

    while (ss >> token) tokens.push_back(token); // splits the tokens
    for(auto x : tokens) cout << x << endl ; // prints the tokens

    return 0;
}


sample input:

port city international university

sample output:

port
city
international
university

Note that by default this will work for only space as the delimiter. you can use custom delimiter. For that, you have customized the code. let the delimiter be ','. so use

char delimiter = ',' ;
while(getline(ss, token, delimiter)) tokens.push_back(token) ;

instead of

while (ss >> token) tokens.push_back(token);
Technic answered 25/10, 2008 at 8:58 Comment(0)
D
6

The following is a much better way to do this. It can take any character and doesn't split lines unless you want. No special libraries are needed (well, besides std, but who really considers that an extra library) No pointers or references are needed, and it's static. Just simple plain C++.

#pragma once
#include <vector>
#include <sstream>
using namespace std;
class Helpers
{
    public:
        static vector<string> split(string s, char delim)
        {
            stringstream temp (stringstream::in | stringstream::out);
            vector<string> elems(0);
            if (s.size() == 0 || delim == 0)
                return elems;
            for(char c : s)
            {
                if(c == delim)
                {
                    elems.push_back(temp.str());
                    temp = stringstream(stringstream::in | stringstream::out);
                }
                else
                    temp << c;
            }
            if (temp.str().size() > 0)
                elems.push_back(temp.str());
                return elems;
            }

        //Splits string s with a list of delimiters in delims (it's just a list, like if we wanted to
        //split at the following letters, a, b, c we would make delims="abc".
        static vector<string> split(string s, string delims)
        {
            stringstream temp (stringstream::in | stringstream::out);
            vector<string> elems(0);
            bool found;
            if(s.size() == 0 || delims.size() == 0)
                return elems;
            for(char c : s)
            {
                found = false;
                for(char d : delims)
                {
                    if (c == d)
                    {
                        elems.push_back(temp.str());
                        temp = stringstream(stringstream::in | stringstream::out);
                        found = true;
                        break;
                    }
                }
                if(!found)
                    temp << c;
            }
            if(temp.str().size() > 0)
                elems.push_back(temp.str());
            return elems;
        }
};
Dispensation answered 25/10, 2008 at 8:58 Comment(0)
T
5

As a hobbyist, this is the first solution that came to my mind. I'm kind of curious why I haven't seen a similar solution here yet, is there something fundamentally wrong with how I did it?

#include <iostream>
#include <string>
#include <vector>

std::vector<std::string> split(const std::string &s, const std::string &delims)
{
    std::vector<std::string> result;
    std::string::size_type pos = 0;
    while (std::string::npos != (pos = s.find_first_not_of(delims, pos))) {
        auto pos2 = s.find_first_of(delims, pos);
        result.emplace_back(s.substr(pos, std::string::npos == pos2 ? pos2 : pos2 - pos));
        pos = pos2;
    }
    return result;
}

int main()
{
    std::string text{"And then I said: \"I don't get it, why would you even do that!?\""};
    std::string delims{" :;\".,?!"};
    auto words = split(text, delims);
    std::cout << "\nSentence:\n  " << text << "\n\nWords:";
    for (const auto &w : words) {
        std::cout << "\n  " << w;
    }
    return 0;
}

http://cpp.sh/7wmzy

Took answered 25/10, 2008 at 8:58 Comment(0)
R
5

When dealing with whitespace as separator, the obvious answer of using std::istream_iterator<T> is already given and voted up a lot. Of course, elements may not be separated by whitespace but by some separator instead. I didn't spot any answer which just redefines the meaning of whitespace to be said separator and then uses the conventional approach.

The way to change what streams consider whitespace, you'd simply change the stream's std::locale using (std::istream::imbue()) with a std::ctype<char> facet with its own definition of what whitespace means (it can be done for std::ctype<wchar_t>, too, but its is actually slightly different because std::ctype<char> is table-driven while std::ctype<wchar_t> is driven by virtual functions).

#include <iostream>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <locale>

struct whitespace_mask {
    std::ctype_base::mask mask_table[std::ctype<char>::table_size];
    whitespace_mask(std::string const& spaces) {
        std::ctype_base::mask* table = this->mask_table;
        std::ctype_base::mask const* tab
            = std::use_facet<std::ctype<char>>(std::locale()).table();
        for (std::size_t i(0); i != std::ctype<char>::table_size; ++i) {
            table[i] = tab[i] & ~std::ctype_base::space;
        }
        std::for_each(spaces.begin(), spaces.end(), [=](unsigned char c) {
            table[c] |= std::ctype_base::space;
        });
    }
};
class whitespace_facet
    : private whitespace_mask
    , public std::ctype<char> {
public:
    whitespace_facet(std::string const& spaces)
        : whitespace_mask(spaces)
        , std::ctype<char>(this->mask_table) {
    }
};

struct whitespace {
    std::string spaces;
    whitespace(std::string const& spaces): spaces(spaces) {}
};
std::istream& operator>>(std::istream& in, whitespace const& ws) {
    std::locale loc(in.getloc(), new whitespace_facet(ws.spaces));
    in.imbue(loc);
    return in;
}
// everything above would probably go into a utility library...

int main() {
    std::istringstream in("a, b, c, d, e");
    std::copy(std::istream_iterator<std::string>(in >> whitespace(", ")),
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));

    std::istringstream pipes("a b c|  d |e     e");
    std::copy(std::istream_iterator<std::string>(pipes >> whitespace("|")),
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));   
}

Most of the code is for packaging up a general purpose tool providing soft delimiters: multiple delimiters in a row are merged. There is no way to produce an empty sequence. When different delimiters are needed within a stream, you'd probably use differently set up streams using a shared stream buffer:

void f(std::istream& in) {
    std::istream pipes(in.rdbuf());
    pipes >> whitespace("|");
    std::istream comma(in.rdbuf());
    comma >> whitespace(",");

    std::string s0, s1;
    if (pipes >> s0 >> std::ws   // read up to first pipe and ignore sequence of pipes
        && comma >> s1 >> std::ws) { // read up to first comma and ignore commas
        // ...
    }
}
Ribwort answered 25/10, 2008 at 8:58 Comment(0)
F
5

Here is my solution using C++11 and the STL. It should be reasonably efficient:

#include <vector>
#include <string>
#include <cstring>
#include <iostream>
#include <algorithm>
#include <functional>

std::vector<std::string> split(const std::string& s)
{
    std::vector<std::string> v;

    const auto end = s.end();
    auto to = s.begin();
    decltype(to) from;

    while((from = std::find_if(to, end,
        [](char c){ return !std::isspace(c); })) != end)
    {
        to = std::find_if(from, end, [](char c){ return std::isspace(c); });
        v.emplace_back(from, to);
    }

    return v;
}

int main()
{
    std::string s = "this is the string  to  split";

    auto v = split(s);

    for(auto&& s: v)
        std::cout << s << '\n';
}

Output:

this
is
the
string
to
split
Freberg answered 25/10, 2008 at 8:58 Comment(6)
This is quite nice. I feel like the code could be clearer though, e.g. end unexpectedly isn't s.end().Satiety
@Satiety Out of curiosity what would you suggest for pos, end and done?Freberg
Also you can make it a bit simpler with find_first_of and find_first_not_of.Satiety
@Satiety Well I shouldn't be using ptr_fun but using std::isspace makes the code more easily modifiable to accommodate different locales. Having said that my current working version uses find_first_of. That makes it more efficient and able to split on any character not just whitespace. In fact I also have a version that splits on a supplied string too , that uses std::search (the possibilities for this function are multifold it seems).Freberg
Yeah, I rewrote it like this. Thanks for the code!Satiety
@Satiety looks good. I'm not going to post my current version(s) here because they are monstrously templated to accommodate different string and container types and look horrendous (I'm overdue revisiting that code). But I will get rid of that std::ptr_fun heheFreberg
T
5

I wrote the following piece of code. You can specify delimiter, which can be a string. The result is similar to Java's String.split, with empty string in the result.

For example, if we call split("ABCPICKABCANYABCTWO:ABC", "ABC"), the result is as follows:

0  <len:0>
1 PICK <len:4>
2 ANY <len:3>
3 TWO: <len:4>
4  <len:0>

Code:

vector <string> split(const string& str, const string& delimiter = " ") {
    vector <string> tokens;

    string::size_type lastPos = 0;
    string::size_type pos = str.find(delimiter, lastPos);

    while (string::npos != pos) {
        // Found a token, add it to the vector.
        cout << str.substr(lastPos, pos - lastPos) << endl;
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        lastPos = pos + delimiter.size();
        pos = str.find(delimiter, lastPos);
    }

    tokens.push_back(str.substr(lastPos, str.size() - lastPos));
    return tokens;
}
Tartuffery answered 25/10, 2008 at 8:58 Comment(0)
S
4

This is my versión taken the source of Kev:

#include <string>
#include <vector>
void split(vector<string> &result, string str, char delim ) {
  string tmp;
  string::iterator i;
  result.clear();

  for(i = str.begin(); i <= str.end(); ++i) {
    if((const char)*i != delim  && i != str.end()) {
      tmp += *i;
    } else {
      result.push_back(tmp);
      tmp = "";
    }
  }
}

After, call the function and do something with it:

vector<string> hosts;
split(hosts, "192.168.1.2,192.168.1.3", ',');
for( size_t i = 0; i < hosts.size(); i++){
  cout <<  "Connecting host : " << hosts.at(i) << "..." << endl;
}
Sow answered 25/10, 2008 at 8:58 Comment(0)
S
3

Based on Galik's answer I made this. This is mostly here so I don't have to keep writing it again and again. It's crazy that C++ still doesn't have a native split function. Features:

  • Should be very fast.
  • Easy to understand (I think).
  • Merges empty sections.
  • Trivial to use several delimiters (e.g. "\r\n")
#include <string>
#include <vector>
#include <algorithm>

std::vector<std::string> split(const std::string& s, const std::string& delims)
{
    using namespace std;

    vector<string> v;

    // Start of an element.
    size_t elemStart = 0;

    // We start searching from the end of the previous element, which
    // initially is the start of the string.
    size_t elemEnd = 0;

    // Find the first non-delim, i.e. the start of an element, after the end of the previous element.
    while((elemStart = s.find_first_not_of(delims, elemEnd)) != string::npos)
    {
        // Find the first delem, i.e. the end of the element (or if this fails it is the end of the string).
        elemEnd = s.find_first_of(delims, elemStart);
        // Add it.
        v.emplace_back(s, elemStart, elemEnd == string::npos ? string::npos : elemEnd - elemStart);
    }
    // When there are no more non-spaces, we are done.

    return v;
}
Satiety answered 25/10, 2008 at 8:58 Comment(0)
K
3

This is my solution to this problem:

vector<string> get_tokens(string str) {
    vector<string> dt;
    stringstream ss;
    string tmp; 
    ss << str;
    for (size_t i; !ss.eof(); ++i) {
        ss >> tmp;
        dt.push_back(tmp);
    }
    return dt;
}

This function returns a vector of strings.

Kraus answered 25/10, 2008 at 8:58 Comment(0)
V
3

We can use strtok in c++ ,

#include <iostream>
#include <cstring>
using namespace std;

int main()
{
    char str[]="Mickey M;12034;911416313;M;01a;9001;NULL;0;13;12;0;CPP,C;MSC,3D;FEND,BEND,SEC;";
    char *pch = strtok (str,";,");
    while (pch != NULL)
    {
        cout<<pch<<"\n";
        pch = strtok (NULL, ";,");
    }
    return 0;
}
Vassily answered 25/10, 2008 at 8:58 Comment(0)
D
3

I use the following code:

namespace Core
{
    typedef std::wstring String;

    void SplitString(const Core::String& input, const Core::String& splitter, std::list<Core::String>& output)
    {
        if (splitter.empty())
        {
            throw std::invalid_argument(); // for example
        }

        std::list<Core::String> lines;

        Core::String::size_type offset = 0;

        for (;;)
        {
            Core::String::size_type splitterPos = input.find(splitter, offset);

            if (splitterPos != Core::String::npos)
            {
                lines.push_back(input.substr(offset, splitterPos - offset));
                offset = splitterPos + splitter.size();
            }
            else
            {
                lines.push_back(input.substr(offset));
                break;
            }
        }

        lines.swap(output);
    }
}

// gtest:

class SplitStringTest: public testing::Test
{
};

TEST_F(SplitStringTest, EmptyStringAndSplitter)
{
    std::list<Core::String> result;
    ASSERT_ANY_THROW(Core::SplitString(Core::String(), Core::String(), result));
}

TEST_F(SplitStringTest, NonEmptyStringAndEmptySplitter)
{
    std::list<Core::String> result;
    ASSERT_ANY_THROW(Core::SplitString(L"xy", Core::String(), result));
}

TEST_F(SplitStringTest, EmptyStringAndNonEmptySplitter)
{
    std::list<Core::String> result;
    Core::SplitString(Core::String(), Core::String(L","), result);
    ASSERT_EQ(1, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
}

TEST_F(SplitStringTest, OneCharSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L"x,y", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y", *result.rbegin());

    Core::SplitString(L",xy", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L"xy", *result.rbegin());

    Core::SplitString(L"xy,", L",", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"xy", *result.begin());
    ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, TwoCharsSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L"x,.y,z", L",.", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y,z", *result.rbegin());

    Core::SplitString(L"x,,y,z", L",,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y,z", *result.rbegin());
}

TEST_F(SplitStringTest, RecursiveSplitter)
{
    std::list<Core::String> result;

    Core::SplitString(L",,,", L",,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L",", *result.rbegin());

    Core::SplitString(L",.,.,", L",.,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(L".,", *result.rbegin());

    Core::SplitString(L"x,.,.,y", L",.,", result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L".,y", *result.rbegin());

    Core::SplitString(L",.,,.,", L",.,", result);
    ASSERT_EQ(3, result.size());
    ASSERT_EQ(Core::String(), *result.begin());
    ASSERT_EQ(Core::String(), *(++result.begin()));
    ASSERT_EQ(Core::String(), *result.rbegin());
}

TEST_F(SplitStringTest, NullTerminators)
{
    std::list<Core::String> result;

    Core::SplitString(L"xy", Core::String(L"\0", 1), result);
    ASSERT_EQ(1, result.size());
    ASSERT_EQ(L"xy", *result.begin());

    Core::SplitString(Core::String(L"x\0y", 3), Core::String(L"\0", 1), result);
    ASSERT_EQ(2, result.size());
    ASSERT_EQ(L"x", *result.begin());
    ASSERT_EQ(L"y", *result.rbegin());
}
Driblet answered 25/10, 2008 at 8:58 Comment(0)
P
2

C++17 version without any memory allocation (except may be for std::function)

void iter_words(const std::string_view& input, const std::function<void(std::string_view)>& process_word) {

    auto itr = input.begin();

    auto consume_whitespace = [&]() {
        for(; itr != input.end(); ++itr) {
            if(!isspace(*itr))
                return;
        }
    };

    auto consume_letters = [&]() {
        for(; itr != input.end(); ++itr) {
            if(isspace(*itr))
                return;
        }
    };

    while(true) {
        consume_whitespace();
        if(itr == input.end())
            return;
        auto word_start = itr - input.begin();
        consume_letters();
        auto word_end = itr - input.begin();
        process_word(input.substr(word_start, word_end - word_start));
    }
}

int main() {
    iter_words("foo bar", [](std::string_view sv) {
        std::cout << "Got word: " <<  sv << '\n';
    });
    return 0;
}
Pestiferous answered 25/10, 2008 at 8:58 Comment(1)
Change the std function to a generic function object; the overhead there is ickySebiferous
O
2

Yes, I looked through all 30 examples.

I couldn't find a version of split that works for multi-char delimiters, so here's mine:

#include <string>
#include <vector>

using namespace std;

vector<string> split(const string &str, const string &delim)
{   
    const auto delim_pos = str.find(delim);

    if (delim_pos == string::npos)
        return {str};

    vector<string> ret{str.substr(0, delim_pos)};
    auto tail = split(str.substr(delim_pos + delim.size(), string::npos), delim);

    ret.insert(ret.end(), tail.begin(), tail.end());

    return ret;
}

Probably not the most efficient of implementations, but it's a very straightforward recursive solution, using only <string> and <vector>.

Ah, it's written in C++11, but there's nothing special about this code, so you could easily adapt it to C++98.

Owner answered 25/10, 2008 at 8:58 Comment(0)
P
2

Here's my take on this. I had to process the input string word by word, which could have been done by using space to count words but I felt it would be tedious and I should split the words into vectors.

#include<iostream>
#include<vector>
#include<string>
#include<stdio.h>
using namespace std;
int main()
{
    char x = '\0';
    string s = "";
    vector<string> q;
    x = getchar();
    while(x != '\n')
    {
        if(x == ' ')
        {
            q.push_back(s);
            s = "";
            x = getchar();
            continue;
        }
        s = s + x;
        x = getchar();
    }
    q.push_back(s);
    for(int i = 0; i<q.size(); i++)
        cout<<q[i]<<" ";
    return 0;
}
  1. Doesn't take care of multiple spaces.
  2. If the last word is not immediately followed by newline character, it includes the whitespace between the last word's last character and newline character.
Pizarro answered 25/10, 2008 at 8:58 Comment(0)
F
2

For those who need alternative in splitting string with a string delimiter, perhaps you can try my following solution.

std::vector<size_t> str_pos(const std::string &search, const std::string &target)
{
    std::vector<size_t> founds;

    if(!search.empty())
    {
        size_t start_pos = 0;

        while (true)
        {
            size_t found_pos = target.find(search, start_pos);

            if(found_pos != std::string::npos)
            {
                size_t found = found_pos;

                founds.push_back(found);

                start_pos = (found_pos + 1);
            }
            else
            {
                break;
            }
        }
    }

    return founds;
}

std::string str_sub_index(size_t begin_index, size_t end_index, const std::string &target)
{
    std::string sub;

    size_t size = target.length();

    const char* copy = target.c_str();

    for(size_t i = begin_index; i <= end_index; i++)
    {
        if(i >= size)
        {
            break;
        }
        else
        {
            char c = copy[i];

            sub += c;
        }
    }

    return sub;
}

std::vector<std::string> str_split(const std::string &delimiter, const std::string &target)
{
    std::vector<std::string> splits;

    if(!delimiter.empty())
    {
        std::vector<size_t> founds = str_pos(delimiter, target);

        size_t founds_size = founds.size();

        if(founds_size > 0)
        {
            size_t search_len = delimiter.length();

            size_t begin_index = 0;

            for(int i = 0; i <= founds_size; i++)
            {
                std::string sub;

                if(i != founds_size)
                {
                    size_t pos  = founds.at(i);

                    sub = str_sub_index(begin_index, pos - 1, target);

                    begin_index = (pos + search_len);
                }
                else
                {
                    sub = str_sub_index(begin_index, (target.length() - 1), target);
                }

                splits.push_back(sub);
            }
        }
    }

    return splits;
}

Those snippets consist of 3 function. The bad news is to use the str_split function you will need the other two functions. Yes it is a huge chunk of code. But the good news is that those additional two functions are able to work independently and sometimes can be useful too.. :)

Tested the function in main() block like this:

int main()
{
    std::string s = "Hello, world! We need to make the world a better place. Because your world is also my world, and our children's world.";

    std::vector<std::string> split = str_split("world", s);

    for(int i = 0; i < split.size(); i++)
    {
        std::cout << split[i] << std::endl;
    }
}

And it would produce:

Hello, 
! We need to make the 
 a better place. Because your 
 is also my 
, and our children's 
.

I believe that's not the most efficient code, but at least it works. Hope it helps.

Firebox answered 25/10, 2008 at 8:58 Comment(0)
U
2

Here's my entry:

template <typename Container, typename InputIter, typename ForwardIter>
Container
split(InputIter first, InputIter last,
      ForwardIter s_first, ForwardIter s_last)
{
    Container output;

    while (true) {
        auto pos = std::find_first_of(first, last, s_first, s_last);
        output.emplace_back(first, pos);
        if (pos == last) {
            break;
        }

        first = ++pos;
    }

    return output;
}

template <typename Output = std::vector<std::string>,
          typename Input = std::string,
          typename Delims = std::string>
Output
split(const Input& input, const Delims& delims = " ")
{
    using std::cbegin;
    using std::cend;
    return split<Output>(cbegin(input), cend(input),
                         cbegin(delims), cend(delims));
}

auto vec = split("Mary had a little lamb");

The first definition is an STL-style generic function taking two pair of iterators. The second is a convenience function to save you having to do all the begin()s and end()s yourself. You can also specify the output container type as a template parameter if you wanted to use a list, for example.

What makes it elegant (IMO) is that unlike most of the other answers, it's not restricted to strings but will work with any STL-compatible container. Without any change to the code above, you can say:

using vec_of_vecs_t = std::vector<std::vector<int>>;

std::vector<int> v{1, 2, 0, 3, 4, 5, 0, 7, 8, 0, 9};
auto r = split<vec_of_vecs_t>(v, std::initializer_list<int>{0, 2});

which will split the vector v into separate vectors every time a 0 or a 2 is encountered.

(There's also the added bonus that with strings, this implementation is faster than both strtok()- and getline()-based versions, at least on my system.)

Uprear answered 25/10, 2008 at 8:58 Comment(0)
Z
2

Here is my version

#include <vector>

inline std::vector<std::string> Split(const std::string &str, const std::string &delim = " ")
{
    std::vector<std::string> tokens;
    if (str.size() > 0)
    {
        if (delim.size() > 0)
        {
            std::string::size_type currPos = 0, prevPos = 0;
            while ((currPos = str.find(delim, prevPos)) != std::string::npos)
            {
                std::string item = str.substr(prevPos, currPos - prevPos);
                if (item.size() > 0)
                {
                    tokens.push_back(item);
                }
                prevPos = currPos + 1;
            }
            tokens.push_back(str.substr(prevPos));
        }
        else
        {
            tokens.push_back(str);
        }
    }
    return tokens;
}

It works with multi-character delimiters. It prevents empty tokens to get in your results. It uses a single header. It returns the string as one single token when you provide no delimiter. It also returns an empty result if the string is empty. It is unfortunately inefficient because of the huge std::vector copy UNLESS you are compiling using C++11, which should be using the move schematic. In C++11, this code should be fast.

Zecchino answered 25/10, 2008 at 8:58 Comment(0)
E
2

No Boost, no string streams, just the standard C library cooperating together with std::string and std::list: C library functions for easy analysis, C++ data types for easy memory management.

Whitespace is considered to be any combination of newlines, tabs and spaces. The set of whitespace characters is established by the wschars variable.

#include <string>
#include <list>
#include <iostream>
#include <cstring>

using namespace std;

const char *wschars = "\t\n ";

list<string> split(const string &str)
{
  const char *cstr = str.c_str();
  list<string> out;

  while (*cstr) {                     // while remaining string not empty
    size_t toklen;
    cstr += strspn(cstr, wschars);    // skip leading whitespace
    toklen = strcspn(cstr, wschars);  // figure out token length
    if (toklen)                       // if we have a token, add to list
      out.push_back(string(cstr, toklen));
    cstr += toklen;                   // skip over token
  }

  // ran out of string; return list

  return out;
}

int main(int argc, char **argv)
{
  list<string> li = split(argv[1]);
  for (list<string>::iterator i = li.begin(); i != li.end(); i++)
    cout << "{" << *i << "}" << endl;
  return 0;
}

Run:

$ ./split ""
$ ./split "a"
{a}
$ ./split " a "
{a}
$ ./split " a b"
{a}
{b}
$ ./split " a b c"
{a}
{b}
{c}
$ ./split " a b c d  "
{a}
{b}
{c}
{d}

Tail-recursive version of split (itself split into two functions). All destructive manipulation of variables is gone, except for the pushing of strings into the list!

void split_rec(const char *cstr, list<string> &li)
{
  if (*cstr) {
    const size_t leadsp = strspn(cstr, wschars);
    const size_t toklen = strcspn(cstr + leadsp, wschars);

    if (toklen)
      li.push_back(string(cstr + leadsp, toklen));

    split_rec(cstr + leadsp + toklen, li);
  }
}

list<string> split(const string &str)
{
  list<string> out;
  split_rec(str.c_str(), out);
  return out;
}
Enclasp answered 25/10, 2008 at 8:58 Comment(2)
please use std::vector instead of listMolliemollify
@Molliemollify There is no requirement in the question for a specific representation to use for the pieces of the string, hence there is no need to incorporate your suggestion into the answer.Enclasp
M
2

I've been searching for a way to split a string by a separator of any length, so I started writing it from scratch, as existing solutions didn't suit me.

Here is my little algorithm, using only STL:

//use like this
//std::vector<std::wstring> vec = Split<std::wstring> (L"Hello##world##!", L"##");

template <typename valueType>
static std::vector <valueType> Split (valueType text, const valueType& delimiter)
{
    std::vector <valueType> tokens;
    size_t pos = 0;
    valueType token;

    while ((pos = text.find(delimiter)) != valueType::npos) 
    {
        token = text.substr(0, pos);
        tokens.push_back (token);
        text.erase(0, pos + delimiter.length());
    }
    tokens.push_back (text);

    return tokens;
}

It can be used with separator of any length and form, as far as I've tested. Instantiate with either string or wstring type.

All the algorithm does is it searches for the delimiter, gets the part of the string that is up to the delimiter, deletes the delimiter and searches again until it finds it no more.

Of course, you can use any number of whitespaces for the delimiter.

I hope it helps.

Mountaineer answered 25/10, 2008 at 8:58 Comment(1)
that's actually quite nice. although I don't think erasing is the most efficient way and (2) what about keeping empty tokens?Molliemollify
S
2

LazyStringSplitter:

#include <string>
#include <algorithm>
#include <unordered_set>

using namespace std;

class LazyStringSplitter
{
    string::const_iterator start, finish;
    unordered_set<char> chop;

public:

    // Empty Constructor
    explicit LazyStringSplitter()
    {}

    explicit LazyStringSplitter (const string cstr, const string delims)
        : start(cstr.begin())
        , finish(cstr.end())
        , chop(delims.begin(), delims.end())
    {}

    void operator () (const string cstr, const string delims)
    {
        chop.insert(delims.begin(), delims.end());
        start = cstr.begin();
        finish = cstr.end();
    }

    bool empty() const { return (start >= finish); }

    string next()
    {
        // return empty string
        // if ran out of characters
        if (empty())
            return string("");

        auto runner = find_if(start, finish, [&](char c) {
            return chop.count(c) == 1;
        });

        // construct next string
        string ret(start, runner);
        start = runner + 1;

        // Never return empty string
        // + tail recursion makes this method efficient
        return !ret.empty() ? ret : next();
    }
};
  • I call this method the LazyStringSplitter because of one reason - It does not split the string in one go.
  • In essence it behaves like a python generator
  • It exposes a method called next which returns the next string that is split from the original
  • I made use of the unordered_set from c++11 STL, so that look up of delimiters is that much faster
  • And here is how it works

TEST PROGRAM

#include <iostream>
using namespace std;

int main()
{
    LazyStringSplitter splitter;

    // split at the characters ' ', '!', '.', ','
    splitter("This, is a string. And here is another string! Let's test and see how well this does.", " !.,");

    while (!splitter.empty())
        cout << splitter.next() << endl;
    return 0;
}

OUTPUT

This
is
a
string
And
here
is
another
string
Let's
test
and
see
how
well
this
does

Next plan to improve this is to implement begin and end methods so that one can do something like:

vector<string> split_string(splitter.begin(), splitter.end());
Staunch answered 25/10, 2008 at 8:58 Comment(1)
Many questionable implemetation details aside this answer is the only one which does it lazily. I am really disappointed in C++ world here. Well, streamiterator kind of does it too, but then everyone puts result into vector<string> killing all the benefits...Lied
A
2

This is a function I wrote that helps me do a lot. It helped me when doing protocol for WebSockets.

using namespace std;
#include <iostream>
#include <vector>
#include <sstream>
#include <string>

vector<string> split ( string input , string split_id ) {
  vector<string> result;
  int i = 0;
  bool add;
  string temp;
  stringstream ss;
  size_t found;
  string real;
  int r = 0;
    while ( i != input.length() ) {
        add = false;
        ss << input.at(i);
        temp = ss.str();
        found = temp.find(split_id);
        if ( found != string::npos ) {
            add = true;
            real.append ( temp , 0 , found );
        } else if ( r > 0 &&  ( i+1 ) == input.length() ) {
            add = true;
            real.append ( temp , 0 , found );
        }
        if ( add ) {
            result.push_back(real);
            ss.str(string());
            ss.clear();
            temp.clear();
            real.clear();
            r = 0;
        }
        i++;
        r++;
    }
  return result;
}

int main() {
    string s = "S,o,m,e,w,h,e,r,e, down the road \n In a really big C++ house.  \n  Lives a little old lady.   \n   That no one ever knew.    \n    She comes outside.     \n     In the very hot sun.      \n\n\n\n\n\n\n\n   And throws C++ at us.    \n    The End.  FIN.";
    vector < string > Token;
    Token = split ( s , "," );
    for ( int i = 0 ; i < Token.size(); i++)    cout << Token.at(i) << endl;
    cout << endl << Token.size();
    int a;
    cin >> a;
    return a;
}
Attention answered 25/10, 2008 at 8:58 Comment(0)
T
2

I use the following

void split(string in, vector<string>& parts, char separator) {
    string::iterator  ts, curr;
    ts = curr = in.begin();
    for(; curr <= in.end(); curr++ ) {
        if( (curr == in.end() || *curr == separator) && curr > ts )
               parts.push_back( string( ts, curr ));
        if( curr == in.end() )
               break;
        if( *curr == separator ) ts = curr + 1; 
    }
}

PlasmaHH, I forgot to include the extra check( curr > ts) for removing tokens with whitespace.

Trice answered 25/10, 2008 at 8:58 Comment(1)
That doesnt split by arbitray whitespace and produces empty strings for subsequent whitespace.Ramsden
S
2

Quick version which uses vector as the base class, giving full access to all of its operators:

    // Split string into parts.
    class Split : public std::vector<std::string>
    {
        public:
            Split(const std::string& str, char* delimList)
            {
               size_t lastPos = 0;
               size_t pos = str.find_first_of(delimList);

               while (pos != std::string::npos)
               {
                    if (pos != lastPos)
                        push_back(str.substr(lastPos, pos-lastPos));
                    lastPos = pos + 1;
                    pos = str.find_first_of(delimList, lastPos);
               }
               if (lastPos < str.length())
                   push_back(str.substr(lastPos, pos-lastPos));
            }
    };

Example used to populate an STL set:

std::set<std::string> words;
Split split("Hello,World", ",");
words.insert(split.begin(), split.end());
Sculpsit answered 25/10, 2008 at 8:58 Comment(1)
Inefficient and you're deriving from an STL container - possibly one of the worst things you could do.Voguish
F
1

Some C++20 compilers and most of the C++23 compilers (ranges and string_view)

for (auto word : std::views::split("Somewhere down the road", ' '))
        std::cout << std::string_view{ word.begin(), word.end() } << std::endl;
Feodora answered 25/10, 2008 at 8:58 Comment(0)
S
1

Yet another way -- continuation passing style, zero allocation, function based delimiting.

 void split( auto&& data, auto&& splitter, auto&& operation ) {
   using std::begin; using std::end;
   auto prev = begin(data);
   while (prev != end(data) ) {
     auto&&[prev,next] = splitter( prev, end(data) );
     operation(prev,next);
     prev = next;
   }
 }

Now we can write specific split functions based off this.

 auto anyOfSplitter(auto delimiters) {
   return [delimiters](auto begin, auto end) {
     while( begin != end && 0 == std::string_view(begin, end).find_first_of(delimiters) ) {
       ++begin;
     }
     auto view = std::string_view(begin, end);
     auto next = view.find_first_of(delimiters);
     if (next != view.npos)
       return std::make_pair( begin, begin + next );
     else
       return std::make_pair( begin, end );
   };
 }

we can now produce a traditional std string split like this:

 template<class C>
 auto traditional_any_of_split( std::string_view<C> str, std::string_view<C> delim ) {
   std::vector<std::basic_string<C>> retval;
   split( str, anyOfSplitter(delim), [&](auto s, auto f) {
     retval.emplace_back(s,f);
   });
   return retval;
 }

or we can use find instead

 auto findSplitter(auto delimiter) {
   return [delimiter](auto begin, auto end) {
     while( begin != end && 0 == std::string_view(begin, end).find(delimiter) ) {
       begin += delimiter.size();
     }
     auto view = std::string_view(begin, end);
     auto next = view.find(delimiter);
     if (next != view.npos)
       return std::make_pair( begin, begin + next );
     else
       return std::make_pair( begin, end );
   };
 }

 template<class C>
 auto traditional_find_split( std::string_view<C> str, std::string_view<C> delim ) {
   std::vector<std::basic_string<C>> retval;
   split( str, findSplitter(delim), [&](auto s, auto f) {
     retval.emplace_back(s,f);
   });
   return retval;
 }

by replacing the splitter portion.

Both of these allocate a buffer of return values. We can swap the return values to string views at the cost of manually managing lifetime.

We can also take a continuation that will get passed the string views one at a time, avoiding even allocating the vector of views.

This can be extended with an abort option, so that we can abort after reading a few prefix strings.

Sebiferous answered 25/10, 2008 at 8:58 Comment(0)
B
1

A minimal solution is a function which takes as input a std::string and a set of delimiter characters (as a std::string), and returns a std::vector of std::strings.

#include <string>
#include <vector>

std::vector<std::string>
tokenize(const std::string& str, const std::string& delimiters)
{
  using ssize_t = std::string::size_type;
  const ssize_t str_ln = str.length();
  ssize_t last_pos = 0;

  // container for the extracted tokens
  std::vector<std::string> tokens;

  while (last_pos < str_ln) {
      // find the position of the next delimiter
      ssize_t pos = str.find_first_of(delimiters, last_pos);

      // if no delimiters found, set the position to the length of string
      if (pos == std::string::npos)
         pos = str_ln;

      // if the substring is nonempty, store it in the container
      if (pos != last_pos)
         tokens.emplace_back(str.substr(last_pos, pos - last_pos));

      // scan past the previous substring
      last_pos = pos + 1;
  }

  return tokens;
}

A usage example:

#include <iostream>

int main()
{
    std::string input_str = "one + two * (three - four)!!---! ";
    const char* delimiters = "! +- (*)";
    std::vector<std::string> tokens = tokenize(input_str, delimiters);

    std::cout << "input = '" << input_str << "'\n"
              << "delimiters = '" << delimiters << "'\n"
              << "nr of tokens found = " << tokens.size() << std::endl;
    for (const std::string& tk : tokens) {
        std::cout << "token = '" << tk << "'\n";
    }

  return 0;
}

Bitter answered 25/10, 2008 at 8:58 Comment(0)
P
1

I have a very different approach from the other solutions that offers a lot of value in ways that the other solutions are variously lacking, but of course also has its own down sides. Here is the working implementation, with the example of putting <tag></tag> around words.

For a start, this problem can be solved with one loop, no additional memory, and by considering merely four logical cases. Conceptually, we're interested in boundaries. Our code should reflect that: let's iterate through the string and look at two characters at a time, bearing in mind that we have special cases at the start and end of the string.

The downside is that we have to write the implementation, which is somewhat verbose, but mostly convenient boilerplate.

The upside is that we wrote the implementation, so it is very easy to customize it to specific needs, such as distinguishing left and write word boundaries, using any set of delimiters, or handling other cases such as non-boundary or erroneous positions.

using namespace std;

#include <iostream>
#include <string>

#include <cctype>

typedef enum boundary_type_e {
    E_BOUNDARY_TYPE_ERROR = -1,
    E_BOUNDARY_TYPE_NONE,
    E_BOUNDARY_TYPE_LEFT,
    E_BOUNDARY_TYPE_RIGHT,
} boundary_type_t;

typedef struct boundary_s {
    boundary_type_t type;
    int pos;
} boundary_t;

bool is_delim_char(int c) {
    return isspace(c); // also compare against any other chars you want to use as delimiters
}

bool is_word_char(int c) {
    return ' ' <= c && c <= '~' && !is_delim_char(c);
}

boundary_t maybe_word_boundary(string str, int pos) {
    int len = str.length();
    if (pos < 0 || pos >= len) {
        return (boundary_t){.type = E_BOUNDARY_TYPE_ERROR};
    } else {
        if (pos == 0 && is_word_char(str[pos])) {
            // if the first character is word-y, we have a left boundary at the beginning
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos};
        } else if (pos == len - 1 && is_word_char(str[pos])) {
            // if the last character is word-y, we have a right boundary left of the null terminator
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        } else if (!is_word_char(str[pos]) && is_word_char(str[pos + 1])) {
            // if we have a delimiter followed by a word char, we have a left boundary left of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_LEFT, .pos = pos + 1};
        } else if (is_word_char(str[pos]) && !is_word_char(str[pos + 1])) {
            // if we have a word char followed by a delimiter, we have a right boundary right of the word char
            return (boundary_t){.type = E_BOUNDARY_TYPE_RIGHT, .pos = pos + 1};
        }
        return (boundary_t){.type = E_BOUNDARY_TYPE_NONE};
    }
}

int main() {
    string str;
    getline(cin, str);

    int len = str.length();
    for (int i = 0; i < len; i++) {
        boundary_t boundary = maybe_word_boundary(str, i);
        if (boundary.type == E_BOUNDARY_TYPE_LEFT) {
            // whatever
        } else if (boundary.type == E_BOUNDARY_TYPE_RIGHT) {
            // whatever
        }
    }
}

As you can see, the code is very simple to understand and fine tune, and the actual usage of the code is very short and simple. Using C++ should not stop us from writing the simplest and most readily customized code possible, even if that means not using the STL. I would think this is an instance of what Linus Torvalds might call "taste", since we have eliminated all the logic we don't need while writing in a style that naturally allows more cases to be handled when and if the need to handle them arises.

What could improve this code might be the use of enum class, accepting a function pointer to is_word_char in maybe_word_boundary instead of invoking is_word_char directly, and passing a lambda.

Parisparish answered 25/10, 2008 at 8:58 Comment(0)
S
1

Just for convenience:

template<class V, typename T>
bool in(const V &v, const T &el) {
    return std::find(v.begin(), v.end(), el) != v.end();
}

The actual splitting based on multiple delimiters:

std::vector<std::string> split(const std::string &s,
                               const std::vector<char> &delims) {
    std::vector<std::string> res;
    auto stuff = [&delims](char c) { return !in(delims, c); };
    auto space = [&delims](char c) { return in(delims, c); };
    auto first = std::find_if(s.begin(), s.end(), stuff);
    while (first != s.end()) {
        auto last = std::find_if(first, s.end(), space);
        res.push_back(std::string(first, last));
        first = std::find_if(last + 1, s.end(), stuff);
    }
    return res;
}

The usage:

int main() {
    std::string s = "   aaa,  bb  cc ";
    for (auto el: split(s, {' ', ','}))
        std::cout << el << std::endl;
    return 0;
}
Swatch answered 25/10, 2008 at 8:58 Comment(0)
F
1
#include <iostream>
#include <vector>
using namespace std;

int main() {
  string str = "ABC AABCD CDDD RABC GHTTYU FR";
  str += " "; //dirty hack: adding extra space to the end
  vector<string> v;

  for (int i=0; i<(int)str.size(); i++) {
    int a, b;
    a = i;

    for (int j=i; j<(int)str.size(); j++) {
      if (str[j] == ' ') {
        b = j;
        i = j;
        break;
      }
    }
    v.push_back(str.substr(a, b-a));
  }

  for (int i=0; i<v.size(); i++) {
    cout<<v[i].size()<<" "<<v[i]<<endl;
  }
  return 0;
}
Feudalize answered 25/10, 2008 at 8:58 Comment(0)
Z
1
// adapted from a "regular" csv parse
std::string stringIn = "my csv  is 10233478 NOTseparated by commas";
std::vector<std::string> commaSeparated(1);
int commaCounter = 0;
for (int i=0; i<stringIn.size(); i++) {
    if (stringIn[i] == " ") {
        commaSeparated.push_back("");
        commaCounter++;
    } else {
        commaSeparated.at(commaCounter) += stringIn[i];
    }
}

in the end you will have a vector of strings with every element in the sentence separated by spaces. only non-standard resource is std::vector (but since an std::string is involved, i figured it would be acceptable).

empty strings are saved as a separate items.

Zeus answered 25/10, 2008 at 8:58 Comment(0)
R
1

I have just written a fine example of how to split a char by symbol, which then places each array of chars (words seperated by your symbol) into a vector. For simplicity i made the vector type of std string.

I hope this helps and is readable to you.

#include <vector>
#include <string>
#include <iostream>

void push(std::vector<std::string> &WORDS, std::string &TMP){
    WORDS.push_back(TMP);
    TMP = "";
}
std::vector<std::string> mySplit(char STRING[]){
        std::vector<std::string> words;
        std::string s;
        for(unsigned short i = 0; i < strlen(STRING); i++){
            if(STRING[i] != ' '){
                s += STRING[i];
            }else{
                push(words, s);
            }
        }
        push(words, s);//Used to get last split
        return words;
}

int main(){
    char string[] = "My awesome string.";
    std::cout << mySplit(string)[2];
    std::cin.get();
    return 0;
}
Rodriquez answered 25/10, 2008 at 8:58 Comment(0)
C
1

I believe no one has posted this solution yet. Instead of using delimiters directly, it basically does the same as boost::split(), i.e., it allows you to pass a predicate that returns true if a char is a delimiter, and false otherwise. I think this gives the programmer a lot more control, and the great thing is you don't need boost.

template <class Container, class String, class Predicate>
void split(Container& output, const String& input,
           const Predicate& pred, bool trimEmpty = false) {
    auto it = begin(input);
    auto itLast = it;
    while (it = find_if(it, end(input), pred), it != end(input)) {
        if (not (trimEmpty and it == itLast)) {
            output.emplace_back(itLast, it);
        }
        ++it;
        itLast = it;
    }
}

Then you can use it like this:

struct Delim {
    bool operator()(char c) {
        return not isalpha(c);
    }
};    

int main() {
    string s("#include<iostream>\n"
             "int main() { std::cout << \"Hello world!\" << std::endl; }");

    vector<string> v;

    split(v, s, Delim(), true);
    /* Which is also the same as */
    split(v, s, [](char c) { return not isalpha(c); }, true);

    for (const auto& i : v) {
        cout << i << endl;
    }
}
Cliffordclift answered 25/10, 2008 at 8:58 Comment(0)
O
1

Loop on getline with ' ' as the token.

Optometer answered 25/10, 2008 at 8:58 Comment(0)
G
0

There's a way easier method to do this!!

#include <vector>
#include <string>
std::vector<std::string> splitby(std::string string, char splitter) {
    int splits = 0;
    std::vector<std::string> result = {};
    std::string locresult = "";
    for (unsigned int i = 0; i < string.size(); i++) {
        if ((char)string.at(i) != splitter) {
            locresult += string.at(i);
        }
        else {
            result.push_back(locresult);
            locresult = "";
        }
    }
    if (splits == 0) {
        result.push_back(locresult);
    }
    return result;
}

void printvector(std::vector<std::string> v) {
    std::cout << '{';
    for (unsigned int i = 0; i < v.size(); i++) {
        if (i < v.size() - 1) {
            std::cout << '"' << v.at(i) << "\",";
        }
        else {
            std::cout << '"' << v.at(i) << "\"";
        }
    }
    std::cout << "}\n";
}
Gann answered 25/10, 2008 at 8:58 Comment(0)
K
0
#include <iostream>
#include <string>
#include <deque>

std::deque<std::string> split(
    const std::string& line, 
    std::string::value_type delimiter,
    bool skipEmpty = false
) {
    std::deque<std::string> parts{};

    if (!skipEmpty && !line.empty() && delimiter == line.at(0)) {
        parts.push_back({});
    }

    for (const std::string::value_type& c : line) {
        if (
            (
                c == delimiter 
                &&
                (skipEmpty ? (!parts.empty() && !parts.back().empty()) : true)
            )
            ||
            (c != delimiter && parts.empty())
        ) {
            parts.push_back({});
        }

        if (c != delimiter) {
            parts.back().push_back(c);
        }
    }

    if (skipEmpty && !parts.empty() && parts.back().empty()) {
        parts.pop_back();
    }

    return parts;
}

void test(const std::string& line) {
    std::cout << line << std::endl;

    std::cout << "skipEmpty=0 |";
    for (const std::string& part : split(line, ':')) {
        std::cout << part << '|';
    }
    std::cout << std::endl;

    std::cout << "skipEmpty=1 |";
    for (const std::string& part : split(line, ':', true)) {
        std::cout << part << '|';
    }
    std::cout << std::endl;

    std::cout << std::endl;
}

int main() {
    test("foo:bar:::baz");
    test("");
    test("foo");
    test(":");
    test("::");
    test(":foo");
    test("::foo");
    test(":foo:");
    test(":foo::");

    return 0;
}

Output:

foo:bar:::baz
skipEmpty=0 |foo|bar|||baz|
skipEmpty=1 |foo|bar|baz|


skipEmpty=0 |
skipEmpty=1 |

foo
skipEmpty=0 |foo|
skipEmpty=1 |foo|

:
skipEmpty=0 |||
skipEmpty=1 |

::
skipEmpty=0 ||||
skipEmpty=1 |

:foo
skipEmpty=0 ||foo|
skipEmpty=1 |foo|

::foo
skipEmpty=0 |||foo|
skipEmpty=1 |foo|

:foo:
skipEmpty=0 ||foo||
skipEmpty=1 |foo|

:foo::
skipEmpty=0 ||foo|||
skipEmpty=1 |foo|
Koffler answered 25/10, 2008 at 8:58 Comment(1)
Adding some explanation would be helpful.Thorbert
G
0

my general implementation for string and u32string ~, using the boost::algorithm::split signature.

template<typename CharT, typename UnaryPredicate>
void split(std::vector<std::basic_string<CharT>>& split_result,
           const std::basic_string<CharT>& s,
           UnaryPredicate predicate)
{
    using ST = std::basic_string<CharT>;
    using std::swap;
    std::vector<ST> tmp_result;
    auto iter = s.cbegin(),
         end_iter = s.cend();
    while (true)
    {
        /**
         * edge case: empty str -> push an empty str and exit.
         */
        auto find_iter = find_if(iter, end_iter, predicate);
        tmp_result.emplace_back(iter, find_iter);
        if (find_iter == end_iter) { break; }
        iter = ++find_iter; 
    }
    swap(tmp_result, split_result);
}


template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
           const std::basic_string<CharT>& s,
           const std::basic_string<CharT>& char_candidate)
{
    std::unordered_set<CharT> candidate_set(char_candidate.cbegin(),
                                            char_candidate.cend());
    auto predicate = [&candidate_set](const CharT& c) {
        return candidate_set.count(c) > 0U;
    };
    return split(split_result, s, predicate);
}

template<typename CharT>
void split(std::vector<std::basic_string<CharT>>& split_result,
           const std::basic_string<CharT>& s,
           const CharT* literals)
{
    return split(split_result, s, std::basic_string<CharT>(literals));
}
Grandma answered 25/10, 2008 at 8:58 Comment(0)
B
0

This is an extension of one of the top answers. It now supports setting a max number of returned elements, N. The last bit of the string will end up in the Nth element. The MAXELEMENTS parameter is optional, if set at default 0 it will return an unlimited amount of elements. :-)

.h:

class Myneatclass {
public:
    static std::vector<std::string>& split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS = 0);
    static std::vector<std::string> split(const std::string &s, char delim, const size_t MAXELEMENTS = 0);
};

.cpp:

std::vector<std::string>& Myneatclass::split(const std::string &s, char delim, std::vector<std::string> &elems, const size_t MAXELEMENTS) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        elems.push_back(item);
        if (MAXELEMENTS > 0 && !ss.eof() && elems.size() + 1 >= MAXELEMENTS) {
            std::getline(ss, item);
            elems.push_back(item);
            break;
        }
    }
    return elems;
}
std::vector<std::string> Myneatclass::split(const std::string &s, char delim, const size_t MAXELEMENTS) {
    std::vector<std::string> elems;
    split(s, delim, elems, MAXELEMENTS);
    return elems;
}
Bulgar answered 25/10, 2008 at 8:58 Comment(0)
C
0

if you want split string by some chars you can use

#include<iostream>
#include<string>
#include<vector>
#include<iterator>
#include<sstream>
#include<string>

using namespace std;
void replaceOtherChars(string &input, vector<char> &dividers)
{
    const char divider = dividers.at(0);
    int replaceIndex = 0;
    vector<char>::iterator it_begin = dividers.begin()+1,
        it_end= dividers.end();
    for(;it_begin!=it_end;++it_begin)
    {
        replaceIndex = 0;
        while(true)
        {
            replaceIndex=input.find_first_of(*it_begin,replaceIndex);
            if(replaceIndex==-1)
                break;
            input.at(replaceIndex)=divider;
        }
    }
}
vector<string> split(string str, vector<char> chars, bool missEmptySpace =true )
{
    vector<string> result;
    const char divider = chars.at(0);
    replaceOtherChars(str,chars);
    stringstream stream;
    stream<<str;    
    string temp;
    while(getline(stream,temp,divider))
    {
        if(missEmptySpace && temp.empty())
            continue;
        result.push_back(temp);
    }
    return result;
}
int main()
{
    string str ="milk, pigs.... hot-dogs ";
    vector<char> arr;
    arr.push_back(' '); arr.push_back(','); arr.push_back('.');
    vector<string> result = split(str,arr);
    vector<string>::iterator it_begin= result.begin(),
        it_end= result.end();
    for(;it_begin!=it_end;++it_begin)
    {
        cout<<*it_begin<<endl;
    }
return 0;
}
Crumhorn answered 25/10, 2008 at 8:58 Comment(0)
O
0

Thank you @Jairo Abdiel Toribio Cisneros. It works for me but your function return some empty element. So for return without empty I have edited with the following:

std::vector<std::string> split(std::string str, const char* delim) {
    std::vector<std::string> v;
    std::string tmp;

    for(std::string::const_iterator i = str.begin(); i <= str.end(); ++i) {
        if(*i != *delim && i != str.end()) {
            tmp += *i;
        } else {
            if (tmp.length() > 0) {
                v.push_back(tmp);
            }
            tmp = "";
        }
    }

    return v;
}

Using:

std::string s = "one:two::three";
std::string delim = ":";
std::vector<std::string> vv = split(s, delim.c_str());
Outsert answered 25/10, 2008 at 8:58 Comment(0)
M
0

very late to the party here I know but I was thinking about the most elegant way of doing this if you were given a range of delimiters rather than whitespace, and using nothing more than the standard library.

Here are my thoughts:

To split words into a string vector by a sequence of delimiters:

template<class Container>
std::vector<std::string> split_by_delimiters(const std::string& input, const Container& delimiters)
{
    std::vector<std::string> result;

    for (auto current = begin(input) ; current != end(input) ; )
    {
        auto first = find_if(current, end(input), not_in(delimiters));
        if (first == end(input)) break;
        auto last = find_if(first, end(input), is_in(delimiters));
        result.emplace_back(first, last);
        current = last;
    }
    return result;
}

to split the other way, by providing a sequence of valid characters:

template<class Container>
std::vector<std::string> split_by_valid_chars(const std::string& input, const Container& valid_chars)
{
    std::vector<std::string> result;

    for (auto current = begin(input) ; current != end(input) ; )
    {
        auto first = find_if(current, end(input), is_in(valid_chars));
        if (first == end(input)) break;
        auto last = find_if(first, end(input), not_in(valid_chars));
        result.emplace_back(first, last);
        current = last;
    }
    return result;
}

is_in and not_in are defined thus:

namespace detail {
    template<class Container>
    struct is_in {
        is_in(const Container& charset)
        : _charset(charset)
        {}

        bool operator()(char c) const
        {
            return find(begin(_charset), end(_charset), c) != end(_charset);
        }

        const Container& _charset;
    };

    template<class Container>
    struct not_in {
        not_in(const Container& charset)
        : _charset(charset)
        {}

        bool operator()(char c) const
        {
            return find(begin(_charset), end(_charset), c) == end(_charset);
        }

        const Container& _charset;
    };

}

template<class Container>
detail::not_in<Container> not_in(const Container& c)
{
    return detail::not_in<Container>(c);
}

template<class Container>
detail::is_in<Container> is_in(const Container& c)
{
    return detail::is_in<Container>(c);
}
Mab answered 25/10, 2008 at 8:58 Comment(0)
L
0

My implementation can be an alternative solution:

std::vector<std::wstring> SplitString(const std::wstring & String, const std::wstring & Seperator)
{
    std::vector<std::wstring> Lines;
    size_t stSearchPos = 0;
    size_t stFoundPos;
    while (stSearchPos < String.size() - 1)
    {
        stFoundPos = String.find(Seperator, stSearchPos);
        stFoundPos = (stFoundPos == std::string::npos) ? String.size() : stFoundPos;
        Lines.push_back(String.substr(stSearchPos, stFoundPos - stSearchPos));
        stSearchPos = stFoundPos + Seperator.size();
    }
    return Lines;
}

Test code:

std::wstring MyString(L"Part 1SEPsecond partSEPlast partSEPend");
std::vector<std::wstring> Parts = IniFile::SplitString(MyString, L"SEP");
std::wcout << L"The string: " << MyString << std::endl;
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
    std::wcout << *it << L"<---" << std::endl;
}
std::wcout << std::endl;
MyString = L"this,time,a,comma separated,string";
std::wcout << L"The string: " << MyString << std::endl;
Parts = IniFile::SplitString(MyString, L",");
for (std::vector<std::wstring>::const_iterator it=Parts.begin(); it<Parts.end(); ++it)
{
    std::wcout << *it << L"<---" << std::endl;
}

Output of the test code:

The string: Part 1SEPsecond partSEPlast partSEPend
Part 1<---
second part<---
last part<---
end<---

The string: this,time,a,comma separated,string
this<---
time<---
a<---
comma separated<---
string<---
Low answered 25/10, 2008 at 8:58 Comment(0)
H
0

My code is:

#include <list>
#include <string>
template<class StringType = std::string, class ContainerType = std::list<StringType> >
class DSplitString:public ContainerType
{
public:
    explicit DSplitString(const StringType& strString, char cChar, bool bSkipEmptyParts = true)
    {
        size_t iPos = 0;
        size_t iPos_char = 0;
        while(StringType::npos != (iPos_char = strString.find(cChar, iPos)))
        {
            StringType strTemp = strString.substr(iPos, iPos_char - iPos);
            if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
                push_back(strTemp);
            iPos = iPos_char + 1;
        }
    }
    explicit DSplitString(const StringType& strString, const StringType& strSub, bool bSkipEmptyParts = true)
    {
        size_t iPos = 0;
        size_t iPos_char = 0;
        while(StringType::npos != (iPos_char = strString.find(strSub, iPos)))
        {
            StringType strTemp = strString.substr(iPos, iPos_char - iPos);
            if((bSkipEmptyParts && !strTemp.empty()) || (!bSkipEmptyParts))
                push_back(strTemp);
            iPos = iPos_char + strSub.length();
        }
    }
};

Example:

#include <iostream>
#include <string>
int _tmain(int argc, _TCHAR* argv[])
{
    DSplitString<> aa("doicanhden1;doicanhden2;doicanhden3;", ';');
    for each (std::string var in aa)
    {
        std::cout << var << std::endl;
    }
    std::cin.get();
    return 0;
}
Historicism answered 25/10, 2008 at 8:58 Comment(1)
This is just a great big ugly wall of code. You should explain the logic behind it.Amative
C
-1

Not that we need more answers, but this is what I came up with after being inspired by Evan Teran.

std::vector <std::string> split(const string &input, auto delimiter, bool skipEmpty=true) {
  /*
  Splits a string at each delimiter and returns these strings as a string vector.
  If the delimiter is not found then nothing is returned.
  If skipEmpty is true then strings between delimiters that are 0 in length will be skipped.
  */
  bool delimiterFound = false;
  int pos=0, pPos=0;
  std::vector <std::string> result;
  while (true) {
    pos = input.find(delimiter,pPos);
    if (pos != std::string::npos) {
      if (skipEmpty==false or pos-pPos > 0) // if empty values are to be kept or not
        result.push_back(input.substr(pPos,pos-pPos));
      delimiterFound = true;
    } else {
      if (pPos < input.length() and delimiterFound) {
        if (skipEmpty==false or input.length()-pPos > 0) // if empty values are to be kept or not
          result.push_back(input.substr(pPos,input.length()-pPos));
      }
      break;
    }
    pPos = pos+1;
  }
  return result;
}
Cryptoanalysis answered 25/10, 2008 at 8:58 Comment(0)
F
-1

Here's my approach, cut and split:

string cut (string& str, const string& del)
{
    string f = str;

    if (in.find_first_of(del) != string::npos)
    {
        f = str.substr(0,str.find_first_of(del));
        str = str.substr(str.find_first_of(del)+del.length());
    }

    return f;
}

vector<string> split (const string& in, const string& del=" ")
{
    vector<string> out();
    string t = in;

    while (t.length() > del.length())
        out.push_back(cut(t,del));

    return out;
}

BTW, if there's something I can do to optimize this ..

Fougere answered 25/10, 2008 at 8:58 Comment(0)
F
-5
void splitString(string str, char delim, string array[], const int arraySize)
{
    int delimPosition, subStrSize, subStrStart = 0;

    for (int index = 0; delimPosition != -1; index++)
    {
        delimPosition = str.find(delim, subStrStart);
        subStrSize = delimPosition - subStrStart;
        array[index] = str.substr(subStrStart, subStrSize);
        subStrStart =+ (delimPosition + 1);
    }
}
Forename answered 25/10, 2008 at 8:58 Comment(1)
Welcome to StackOverflow. Your answer would be improved if you described the code a bit further. What differentiates it from the one (very high scoring) answers on this old question?Parvati
B
-10

For a ridiculously large and probably redundant version, try a lot of for loops.

string stringlist[10];
int count = 0;

for (int i = 0; i < sequence.length(); i++)
{
    if (sequence[i] == ' ')
    {
        stringlist[count] = sequence.substr(0, i);
        sequence.erase(0, i+1);
        i = 0;
        count++;
    }
    else if (i == sequence.length()-1)  // Last word
    {
        stringlist[count] = sequence.substr(0, i+1);
    }
}

It isn't pretty, but by and large (Barring punctuation and a slew of other bugs) it works!

Bjork answered 25/10, 2008 at 8:58 Comment(1)
I was tempted to +1 this answer for its simple, readable code (which I presume rubbed an elegantophile the wrong way, hence the -1), but then I saw that you allocated a fixed-size array of strings to hold the tokens. Come on, you know that's gonna break at the worst possible moment! :)Gusgusba

© 2022 - 2024 — McMap. All rights reserved.