Parse (split) a string in C++ using string delimiter (standard C++)
Asked Answered
P

35

674

I am parsing a string in C++ using the following:

using namespace std;

string parsed,input="text to be parsed";
stringstream input_stringstream(input);

if (getline(input_stringstream,parsed,' '))
{
     // do some processing.
}

Parsing with a single char delimiter is fine. But what if I want to use a string as delimiter.

Example: I want to split:

scott>=tiger

with >= as delimiter so that I can get scott and tiger.

Pizarro answered 10/1, 2013 at 19:16 Comment(3)
stackoverflow.blog/2019/10/11/… scroll down to #5.Mateya
see this question implement reading files and splitting strings with c++20.Titania
@WaisKamal: you could have linked to #236629 directlyCrossroad
I
984

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom
Indign answered 10/1, 2013 at 19:53 Comment(10)
For those who don't want to modify the input string, do size_t last = 0; size_t next = 0; while ((next = s.find(delimiter, last)) != string::npos) { cout << s.substr(last, next-last) << endl; last = next + 1; } cout << s.substr(last) << endl;Justice
NOTE: mushroom outputs outside of the loop, i.e. s = mushroomBlende
Those samples does not extract the last token from string. A sample of mine extracting an IpV4 from one string: <code>size_t last = 0; size_t next = 0; int index = 0; while (index<4) { next = str.find(delimiter, last); auto number = str.substr(last, next - last); IPv4[index++] = atoi(number.c_str()); last = next + 1; }</code>Solecism
@hayk.mart Just a note, that would be the following, you need add 2 not 1 due to the size of the delimiter which is 2 characters :) : std::string s = "scott>=tiger>=mushroom"; std::string delimiter = ">="; size_t last = 0; size_t next = 0; while ((next = s.find(delimiter, last)) != std::string::npos) { std::cout << s.substr(last, next-last) << std::endl; last = next + 2; } std::cout << s.substr(last) << std::endl;Julianejuliann
In order to get "tiger", use std::string token = s.substr(s.find(delimiter) + 1);, if you are sure that it exists (I use +1 in the length)...Insurgency
Hi, I am using the code posted by @Vincenzo Pii, and it works fine, the only problem I have, is that I cant get the last word of my sentence. Anyone that resolved this problem?Cassity
This answer is wrong, if fail to handle the last oneZoologist
Wondering how many of the 615 upvoters missed the last line and are running hidden bugs in their production code. Judging from the comments, I'd wager at least a handful. IMO this answer would be much better suited if it didn't use cout and instead showed it as a function.Forestay
FYI, npos in the while-loop means no position (or end of the string)Bargain
For performance: I thinkDropwort
T
161

For string delimiter

Split string based on a string delimiter. Such as splitting string "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih" based on string delimiter "-+", output will be {"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}

#include <iostream>
#include <sstream>
#include <vector>

// for string delimiter
std::vector<std::string> split(std::string s, std::string delimiter) {
    size_t pos_start = 0, pos_end, delim_len = delimiter.length();
    std::string token;
    std::vector<std::string> res;

    while ((pos_end = s.find(delimiter, pos_start)) != std::string::npos) {
        token = s.substr (pos_start, pos_end - pos_start);
        pos_start = pos_end + delim_len;
        res.push_back (token);
    }

    res.push_back (s.substr (pos_start));
    return res;
}

int main() {
    std::string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
    std::string delimiter = "-+";
    std::vector<std::string> v = split (str, delimiter);

    for (auto i : v) cout << i << endl;

    return 0;
}

**Output**
adsf
qwret
nvfkbdsj
orthdfjgh
dfjrleih




For single character delimiter

Split string based on a character delimiter. For example, splitting string "adsf+qwer+poui+fdgh" with delimiter "+" will output {"adsf", "qwer", "poui", "fdgh"}

#include <iostream>
#include <sstream>
#include <vector>

std::vector<std::string> split (const std::string &s, char delim) {
    std::vector<std::string> result;
    std::stringstream ss (s);
    std::string item;

    while (getline (ss, item, delim)) {
        result.push_back (item);
    }

    return result;
}

int main() {
    std::string str = "adsf+qwer+poui+fdgh";
    std::vector<std::string> v = split (str, '+');

    for (auto i : v) cout << i << endl;

    return 0;
}

**Output**
adsf
qwer
poui
fdgh
Thiazine answered 25/10, 2017 at 11:54 Comment(6)
You are returning vector<string> I think it'll call copy constructor.Lauber
Every reference I've seen shows that the call to the copy constructor is eliminated in that context.Trotter
With "modern" (C++03?) compilers I believe this is correct, RVO and/or move semantics will eliminate the copy constructor.Lavaliere
I tried the one for single character delimiter, and if the string ends in a delimiter (i.e., an empty csv column at the end of the line), it does not return the empty string. It simply returns one fewer string. For example: 1,2,3,4\nA,B,C,Haft
I also tried the one for string delimiter, and if the string ends in a delimiter, the last delimiter becomes part of the last string extracted.Haft
In your first example the arguments could be const references.Hillman
L
135

This method uses std::string::find without mutating the original string by remembering the beginning and end of the previous substring token.

#include <iostream>
#include <string>

int main()
{
    std::string s = "scott>=tiger";
    std::string delim = ">=";

    auto start = 0U;
    auto end = s.find(delim);
    while (end != std::string::npos)
    {
        std::cout << s.substr(start, end - start) << std::endl;
        start = end + delim.length();
        end = s.find(delim, start);
    }

    std::cout << s.substr(start, end);
}
Liken answered 10/1, 2013 at 21:20 Comment(6)
How do I perform this operation on vector<string> where both strings in the vector are of same form and have same delimiters. I just want to output both strings parsed out in the same way as this works for one string. My "string delim" will remain same ofcourseMoorefield
Shouldn't the last line rather be s.substr(start, end - start) ? I guess this only works as start + end > size() and as such it always takes the rest of the string ...Unsophisticated
Since end == std::string::npos, it means we want to return the final token.Liken
The last line can be further simplified to s.substr(start) with no need to specify the length because it will extract the entire trainling substring if we omit the length.Bluefield
You could move end = s.find(delim, start) into the while condition.Sweeting
It seems to me that you can replace substr with std::string_view(s.begin()+start, s.begin() + end - start) for performance, if you use a supporting version of C++.Dropwort
A
65

You can use next function to split string:

vector<string> split(const string& str, const string& delim)
{
    vector<string> tokens;
    size_t prev = 0, pos = 0;
    do
    {
        pos = str.find(delim, prev);
        if (pos == string::npos) pos = str.length();
        string token = str.substr(prev, pos-prev);
        if (!token.empty()) tokens.push_back(token);
        prev = pos + delim.length();
    }
    while (pos < str.length() && prev < str.length());
    return tokens;
}
Accumbent answered 26/5, 2016 at 7:25 Comment(4)
IMO it does't work as expected: split("abc","a") will return a vector or a single string, "bc", where I think it would make more sense if it had returned a vector of elements ["", "bc"]. Using str.split() in Python, it was intuitive to me that it should return an empty string in case delim was found either at the beginning or in the end, but that's just my opinion. Anyway, I just think it should be mentionedCrusty
Would strongly recommend removing the if (!token.empty()) prevent the issue mentioned by @Crusty as well as other issues related to consecutive delimiters.Yorgen
I would remove my upvote if I could, but SO won't let me. The issue brought up by @Crusty is a problem, and removing if (!token.empty()) does not seem to suffice to fix it.Histrionism
@Histrionism this sniplet was designed exactly to skip empty fragments. If you need to keep empty ones I'm afraid you need to write another split implementation. Kindly suggest you to post it here for the good of comunity.Accumbent
W
53

A way of doing it with C++20:

#include <iostream>
#include <ranges>
#include <string_view>

int main()
{
    std::string hello = "text to be parsed";
    auto split = hello
        | std::ranges::views::split(' ')
        | std::ranges::views::transform([](auto&& str) { return std::string_view(&*str.begin(), std::ranges::distance(str)); });

    for (auto&& word : split)
    {
        std::cout << word << std::endl;
    }
}

See:
https://mcmap.net/q/64855/-how-to-split-a-std-string-into-a-range-v3-of-std-string_views
https://en.cppreference.com/w/cpp/ranges/split_view

Wages answered 24/6, 2021 at 19:24 Comment(4)
oh wow. That's a bit complicated.Fountainhead
For gcc you seem to need version 10 or newer, for Clang it does not even work with the latest release as of now (15), however the current trunk version works. See godbolt.org/z/a6fEGYo16 . Might be this issue in clang: github.com/llvm/llvm-project/issues/52696.Earvin
Simple things done complicated :) Common Lisp (with cl-ppcre library): (defvar *delimiters* (cl-ppcre:create-scanner " ")) (cl-ppcre:split *delimiters* "lets see if it works"). But some like it complicated :)Anchovy
Thankfully C++23 makes adds a constructor overload for std::string_view to make this simpler.Pay
P
46

You can also use regex for this:

std::vector<std::string> split(const std::string str, const std::string regex_str)
{
    std::regex regexz(regex_str);
    std::vector<std::string> list(std::sregex_token_iterator(str.begin(), str.end(), regexz, -1),
                                  std::sregex_token_iterator());
    return list;
}

which is equivalent to :

std::vector<std::string> split(const std::string str, const std::string regex_str)
{
    std::sregex_token_iterator token_iter(str.begin(), str.end(), regexz, -1);
    std::sregex_token_iterator end;
    std::vector<std::string> list;
    while (token_iter != end)
    {
        list.emplace_back(*token_iter++);
    }
    return list;
}

and use it like this :

#include <iostream>
#include <string>
#include <regex>

std::vector<std::string> split(const std::string str,
                               const std::string regex_str) {
    std::regex regexz(regex_str);
    return {std::sregex_token_iterator(str.begin(), str.end(), regexz, -1),
            std::sregex_token_iterator()};
}

int main()
{
    std::string input_str = "lets split this";
    std::string regex_str = " "; 
    auto tokens = split(input_str, regex_str);
    for (auto& item: tokens)
    {
        std::cout<<item <<std::endl;
    }
}

play with it online!

you can simply use substrings, characters, etc like normal, or use actual regular expressions to do the splitting.
its also concise and C++11!

Peele answered 18/11, 2020 at 3:46 Comment(5)
This should be the correct answer, provided C++11 is on the table, which if it isn't...you should be using C++>=11, it's a game-changer!Oppression
Please can you explain the return statement in the function split()? I am trying to figure how the tokens are pushed into the std::vector container. Thanks.Unpopular
@DeusXMachina: a fine solution, certainly. One caveat: the "yet more concise form!" in the last code segment will not compile with _LIBCPP_STD_VER > 11, as the method is marked as "delete"... but the earlier code segments that don't implicitly require rvalue reference && compile and run fine under C++2a.Political
This seems to be slow for large cases. Very nice otherwise.Lampkin
I would recommend std::string regex_str= "\\s+" to avoid empty strings when encountering multiple spaces in a sequence.Cider
P
23

Answer is already there, but selected-answer uses erase function which is very costly, think of some very big string(in MBs). Therefore I use below function.

vector<string> split(const string& str, const string& delim)
{
    vector<string> result;
    size_t start = 0;

    for (size_t found = str.find(delim); found != string::npos; found = str.find(delim, start))
    {
        result.emplace_back(str.begin() + start, str.begin() + found);
        start = found + delim.size();
    }
    if (start != str.size())
        result.emplace_back(str.begin() + start, str.end());
    return result;      
}
Predecessor answered 4/8, 2019 at 13:17 Comment(5)
I tested this, and it works. Thanks! In my opinion, this is the best answer because as the original answer-er states, this solution reduces the memory overhead, and the result is conveniently stored in a vector. (replicates the Python string.split() method.)Naraka
A nice improvement would be to use emplace_back() rather than push_back(string(...))Sayles
@Sayles Honoured.Predecessor
You can remove the explicit call to the string constructor too. emplace_back() forwards its arguments to the constructor so you can just write result.emplace_back(i_str.begin()+startIndex, i_str.begin()+found);.Sayles
You could move found = i_str.find(i_delim, startIndex) to within the while condition to avoid calling find in 2 places.Sweeting
C
22

This code splits lines from text, and add everyone into a vector.

vector<string> split(char *phrase, string delimiter){
    vector<string> list;
    string s = string(phrase);
    size_t pos = 0;
    string token;
    while ((pos = s.find(delimiter)) != string::npos) {
        token = s.substr(0, pos);
        list.push_back(token);
        s.erase(0, pos + delimiter.length());
    }
    list.push_back(s);
    return list;
}

Called by:

vector<string> listFilesMax = split(buffer, "\n");
Countercurrent answered 12/6, 2017 at 8:54 Comment(6)
it's working great! I've added list.push_back(s); because it was missing.Drongo
it misses out the last part of the string. After the while loop ends, we need to add the remaining of s as a new token.Impatiens
I've made an edit to the code sample to fix the missing push_back.Echolocation
It will be more nicer vector<string> split(char *phrase, const string delimiter="\n")Lauber
I know kinda late but, it would work much better if this if statement was added before push if (token != "") list.push_back(token); to prevent appending empty strings.Ejection
@OliverTworkowski A lot of the time, what is viewed as being the "correct" behaviour involves leaving the empty strings in. Of course, this may be undesirable in your use case, in which case your suggestion is completely valid.Unstrained
J
20

strtok allows you to pass in multiple chars as delimiters. I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters).

EDIT if you don't want to use c_str() to convert from string to char*, you can use substr and find_first_of to tokenize.

string token, mystring("scott>=tiger");
while(token != mystring){
  token = mystring.substr(0,mystring.find_first_of(">="));
  mystring = mystring.substr(mystring.find_first_of(">=") + 1);
  printf("%s ",token.c_str());
}
Jea answered 10/1, 2013 at 19:18 Comment(3)
Thanks. But I want to use only C++ and not any C functions like strtok() as it would require me to use char array instead of string.Pizarro
@Pizarro So? If a C function does what you need, use it. This isn't a world where C functions aren't available in C++ (in fact, they have to be). .c_str() is cheap and easy, too.Forestay
The check for if(token != mystring) gives wrong results if you have repeating elements in your string. I used your code to make a version that does not have this. It has many changes that change the answer fundamentally, so I wrote my own answer instead of editing. Check it below.Humpbacked
S
5

I would use boost::tokenizer. Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

Here's one that works for your case.

struct my_tokenizer_func
{
    template<typename It>
    bool operator()(It& next, It end, std::string & tok)
    {
        if (next == end)
            return false;
        char const * del = ">=";
        auto pos = std::search(next, end, del, del + 2);
        tok.assign(next, pos);
        next = pos;
        if (next != end)
            std::advance(next, 2);
        return true;
    }

    void reset() {}
};

int main()
{
    std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
    for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
        std::cout << i << '\n';
}
Sapient answered 10/1, 2013 at 19:40 Comment(2)
Thanks. But I want to wish only standard C++ and not a third party library.Pizarro
@TheCrazyProgrammer: Okay, when I read "Standard C++", I thought that meant no non-standard extensions, not that you couldn't use standards conforming third party libraries.Sapient
L
5

Here's my take on this. It handles the edge cases and takes an optional parameter to remove empty entries from the results.

bool endsWith(const std::string& s, const std::string& suffix)
{
    return s.size() >= suffix.size() &&
           s.substr(s.size() - suffix.size()) == suffix;
}

std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool removeEmptyEntries = false)
{
    std::vector<std::string> tokens;

    for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
    {
         size_t position = s.find(delimiter, start);
         end = position != std::string::npos ? position : s.length();

         std::string token = s.substr(start, end - start);
         if (!removeEmptyEntries || !token.empty())
         {
             tokens.push_back(token);
         }
    }

    if (!removeEmptyEntries &&
        (s.empty() || endsWith(s, delimiter)))
    {
        tokens.push_back("");
    }

    return tokens;
}

Examples

split("a-b-c", "-"); // [3]("a","b","c")

split("a--c", "-"); // [3]("a","","c")

split("-b-", "-"); // [3]("","b","")

split("--c--", "-"); // [5]("","","c","","")

split("--c--", "-", true); // [1]("c")

split("a", "-"); // [1]("a")

split("", "-"); // [1]("")

split("", "-", true); // [0]()
Limon answered 24/5, 2017 at 10:22 Comment(1)
C++ 20 added the ends_with member function to std::string. en.cppreference.com/w/cpp/string/basic_string/ends_withSweeting
M
5

This should work perfectly for string (or single character) delimiters. Don't forget to include #include <sstream>.

std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta";
std::string delimiter = "=,+"; 
std::istringstream ss(input);
std::string token;
std::string::iterator it;

while(std::getline(ss, token, *(it = delimiter.begin()))) {
    std::cout << token << std::endl; // Token is extracted using '='
    it++;
    // Skip the rest of delimiter if exists ",+"
    while(it != delimiter.end() and ss.peek() == *(it)) { 
        it++; ss.get(); 
    }
}

The first while loop extracts a token using the first character of the string delimiter. The second while loop skips the rest of the delimiter and stops at the beginning of the next token.

Mckeever answered 6/11, 2019 at 22:0 Comment(2)
This is incorrect. If the input is modified as below, it would split using the first =, when it is not supposed to: std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta=Echo";Jacalynjacamar
@Jacalynjacamar Good catch. I revised my answer to even cover inputs with malformed delimiters.Mckeever
A
5

A very simple/naive approach:

vector<string> words_seperate(string s){
    vector<string> ans;
    string w="";
    for(auto i:s){
        if(i==' '){
           ans.push_back(w);
           w="";
        }
        else{
           w+=i;
        }
    }
    ans.push_back(w);
    return ans;
}

Or you can use boost library split function:

vector<string> result; 
boost::split(result, input, boost::is_any_of("\t"));

Or You can try TOKEN or strtok:

char str[] = "DELIMIT-ME-C++"; 
char *token = strtok(str, "-"); 
while (token) 
{ 
    cout<<token; 
    token = strtok(NULL, "-"); 
} 

Or You can do this:

char split_with=' ';
vector<string> words;
string token; 
stringstream ss(our_string);
while(getline(ss , token , split_with)) words.push_back(token);
Ari answered 7/9, 2020 at 9:0 Comment(0)
S
5

Just in case in the future, someone wants out of the box function of Vincenzo Pii 's answer

#include <vector>
#include <string>


std::vector<std::string> SplitString(
    std::string str,
    std::string delimeter)
{
    std::vector<std::string> splittedStrings = {};
    size_t pos = 0;

    while ((pos = str.find(delimeter)) != std::string::npos)
    {
        std::string token = str.substr(0, pos);
        if (token.length() > 0)
            splittedStrings.push_back(token);
        str.erase(0, pos + delimeter.length());
    }

    if (str.length() > 0)
        splittedStrings.push_back(str);
    return splittedStrings;
}

I also fixed some bugs so that the function won't return an empty string if there is a delimiter at the start or the end of the string

Shuma answered 16/9, 2021 at 16:16 Comment(0)
H
3

This is a complete method that splits the string on any delimiter and returns a vector of the chopped up strings.

It is an adaptation from the answer from ryanbwork. However, his check for: if(token != mystring) gives wrong results if you have repeating elements in your string. This is my solution to that problem.

vector<string> Split(string mystring, string delimiter)
{
    vector<string> subStringList;
    string token;
    while (true)
    {
        size_t findfirst = mystring.find(delimiter);
        if (findfirst == string::npos) //find returns npos if it couldn't find the delimiter anymore
        {
            subStringList.push_back(mystring); //push back the final piece of mystring
            return subStringList;
        }
        token = mystring.substr(0, mystring.find(delimiter));
        mystring = mystring.substr(mystring.find(delimiter) + delimiter.size());
        subStringList.push_back(token);
    }
    return subStringList;
}
Humpbacked answered 11/7, 2019 at 14:48 Comment(4)
Something like while (true) is usually scary to see in a piece of code like this. Personally I'd recommend rewriting this so that the comparison to std::string::npos (or respectively a check against mystring.size()) makes the while (true) obsolete.Ecesis
It's inefficient to repeatedly assign mystring. You can pass a starting index to find_first_of. Also, you're calling find_first_of 3 times every iteration.Sweeting
I'm not sure it works properly with multicharacter delimiters. Because find_first_of : "Finds the first character equal to one of the characters in the given character sequence."Dropwort
I think you are right @КоеКто. I adjusted and tested the code with some basic examples to work for multiple delimiters.Humpbacked
T
3

I make this solution. It is very simple, all the prints/values are in the loop (no need to check after the loop).

#include <iostream>
#include <string>

using std::cout;
using std::string;

int main() {
    string s = "it-+is-+working!";
    string d = "-+";

    int firstFindI = 0;
    int secendFindI = 0;
    while (secendFindI != string::npos)
    {
        secendFindI = s.find(d, firstFindI);
        cout << s.substr(firstFindI, secendFindI - firstFindI) << "\n"; // print sliced part
        firstFindI = secendFindI + d.size(); // add to the search index
    }
}

Thanks to @SteveWard for improving this answer.

Thundery answered 8/3, 2021 at 18:52 Comment(1)
If you use a do/while loop, you won't need to call s.find twice.Sweeting
S
3

This is similar to other answers but it's using string_view. So these are just views for the original string. Similar to the c++20 example. Though this would be a c++17 example. (edit to skip empty matches)

#include <algorithm>
#include <iostream>
#include <string_view>
#include <vector>
std::vector<std::string_view> split(std::string_view buffer,
                                    const std::string_view delimeter = " ") {
  std::vector<std::string_view> ret{};
  std::decay_t<decltype(std::string_view::npos)> pos{};
  while ((pos = buffer.find(delimeter)) != std::string_view::npos) {
    const auto match = buffer.substr(0, pos);
    if (!match.empty()) ret.push_back(match);
    buffer = buffer.substr(pos + delimeter.size());
  }
  if (!buffer.empty()) ret.push_back(buffer);
  return ret;
}
int main() {
  const auto split_values = split("1 2 3 4 5 6 7 8 9     10 ");
  std::for_each(split_values.begin(), split_values.end(),
                [](const auto& str) { std::cout << str << '\n'; });
  return split_values.size();
}
Spew answered 9/8, 2021 at 22:12 Comment(3)
You passed buffer by value, so the string_views in the vector refer to a temporary object.Sweeting
Yea that is the point of string_views. If you let the original value go poof then it points to garbage. In the example I'm using a string literal. So that will always exist for the life of the program. You would use std::string instead if you want to make permanent copies. I'm only using std::vector here because I don't know how many results we're gonna get. Maybe there is a std view we could use in c++23 so we can get the result in a more lazy fashion.Spew
@SteveWard. I think it's OK , but only for string literals, as in this example. Because such literals actually are not temporary but pretty static constant. I agree that temporary strings for string_view are suspicious for dangling.Dropwort
F
2

Since this is the top-rated Stack Overflow Google search result for C++ split string or similar, I'll post a complete, copy/paste runnable example that shows both methods.

splitString uses stringstream (probably the better and easier option in most cases)

splitString2 uses find and substr (a more manual approach)

// SplitString.cpp

#include <iostream>
#include <vector>
#include <string>
#include <sstream>

// function prototypes
std::vector<std::string> splitString(const std::string& str, char delim);
std::vector<std::string> splitString2(const std::string& str, char delim);
std::string getSubstring(const std::string& str, int leftIdx, int rightIdx);


int main(void)
{
  // Test cases - all will pass
  
  std::string str = "ab,cd,ef";
  //std::string str = "abcdef";
  //std::string str = "";
  //std::string str = ",cd,ef";
  //std::string str = "ab,cd,";   // behavior of splitString and splitString2 is different for this final case only, if this case matters to you choose which one you need as applicable
  
  
  std::vector<std::string> tokens = splitString(str, ',');
  
  std::cout << "tokens: " << "\n";
  
  if (tokens.empty())
  {
    std::cout << "(tokens is empty)" << "\n";
  }
  else
  {
    for (auto& token : tokens)
    {
      if (token == "") std::cout << "(empty string)" << "\n";
      else std::cout << token << "\n";
    }
  }
    
  return 0;
}

std::vector<std::string> splitString(const std::string& str, char delim)
{
  std::vector<std::string> tokens;
  
  if (str == "") return tokens;
  
  std::string currentToken;
  
  std::stringstream ss(str);
  
  while (std::getline(ss, currentToken, delim))
  {
    tokens.push_back(currentToken);
  }
  
  return tokens;
}

std::vector<std::string> splitString2(const std::string& str, char delim)
{
  std::vector<std::string> tokens;
  
  if (str == "") return tokens;
  
  int leftIdx = 0;
  
  int delimIdx = str.find(delim);
  
  int rightIdx;
  
  while (delimIdx != std::string::npos)
  {
    rightIdx = delimIdx - 1;
    
    std::string token = getSubstring(str, leftIdx, rightIdx);
    tokens.push_back(token);
    
    // prep for next time around
    leftIdx = delimIdx + 1;
    
    delimIdx = str.find(delim, delimIdx + 1);
  }
  
  rightIdx = str.size() - 1;
  
  std::string token = getSubstring(str, leftIdx, rightIdx);
  tokens.push_back(token);
  
  return tokens;
}

std::string getSubstring(const std::string& str, int leftIdx, int rightIdx)
{
  return str.substr(leftIdx, rightIdx - leftIdx + 1);
}
Foehn answered 10/10, 2020 at 3:34 Comment(0)
H
2

Yet another answer: Here I'm using find_first_not_of string function which returns the position of the first character that does not match any of the characters specified in the delim.

size_t find_first_not_of(const string& delim, size_t pos = 0) const noexcept;

Example:

int main()
{
    size_t start = 0, end = 0;
    std::string str = "scott>=tiger>=cat";
    std::string delim = ">=";
    while ((start = str.find_first_not_of(delim, end)) != std::string::npos)
    {
        end = str.find(delim, start); // finds the 'first' occurance from the 'start'
        std::cout << str.substr(start, end - start)<<std::endl; // extract substring
    }
    return 0;
}

Output:

    scott
    tiger
    cat
Hadrian answered 3/3, 2021 at 12:25 Comment(0)
I
1

If you do not want to modify the string (as in the answer by Vincenzo Pii) and want to output the last token as well, you may want to use this approach:

inline std::vector<std::string> splitString( const std::string &s, const std::string &delimiter ){
    std::vector<std::string> ret;
    size_t start = 0;
    size_t end = 0;
    size_t len = 0;
    std::string token;
    do{ end = s.find(delimiter,start); 
        len = end - start;
        token = s.substr(start, len);
        ret.emplace_back( token );
        start += len + delimiter.length();
        std::cout << token << std::endl;
    }while ( end != std::string::npos );
    return ret;
}
Informal answered 23/5, 2017 at 9:37 Comment(0)
A
1

Here's a concise split function. I decided to have back to back delimiters return as an empty string but you could easily check that if the substring is empty and not add it to the vector if it is.

#include <vector>
#include <string>
using namespace std;



vector<string> split(string to_split, string delimiter) {
    size_t pos = 0;
    vector<string> matches{};
    do {
        pos = to_split.find(delimiter);
        int change_end;
        if (pos == string::npos) {
            pos = to_split.length() - 1;
            change_end = 1;
        }
        else {
            change_end = 0;
        }
        matches.push_back(to_split.substr(0, pos+change_end));
        
        to_split.erase(0, pos+1);

    }
    while (!to_split.empty());
    return matches;

}
Audy answered 8/6, 2021 at 4:38 Comment(0)
G
1

This method use string find and string substr

vector<string> split(const string& str,const string delim){
vector<string> vtokens; 
size_t start = 0;
size_t end = 0;
while((end = str.find(delim,start))!=string::npos){
    vtokens.push_back(str.substr(start,end-start));
    start = end +1;
}
vtokens.push_back(str.substr(start));
return vtokens;
}
Geoff answered 22/8, 2022 at 14:33 Comment(2)
It would be more efficient to do start = end + delim.size(); in case delim is longer than 1 character.Sweeting
@SteveWard I think it would be even incorrect, because start is used for beginning of the string.Dropwort
T
0
#include<iostream>
#include<algorithm>
using namespace std;

int split_count(string str,char delimit){
return count(str.begin(),str.end(),delimit);
}

void split(string str,char delimit,string res[]){
int a=0,i=0;
while(a<str.size()){
res[i]=str.substr(a,str.find(delimit));
a+=res[i].size()+1;
i++;
}
}

int main(){

string a="abc.xyz.mno.def";
int x=split_count(a,'.')+1;
string res[x];
split(a,'.',res);

for(int i=0;i<x;i++)
cout<<res[i]<<endl;
  return 0;
}

P.S: Works only if the lengths of the strings after splitting are equal

Tilly answered 29/1, 2018 at 8:15 Comment(1)
This use GCC extension -- variable length array.Incendiary
H
0

Function:

std::vector<std::string> WSJCppCore::split(const std::string& sWhat, const std::string& sDelim) {
    std::vector<std::string> vRet;
    size_t nPos = 0;
    size_t nLen = sWhat.length();
    size_t nDelimLen = sDelim.length();
    while (nPos < nLen) {
        std::size_t nFoundPos = sWhat.find(sDelim, nPos);
        if (nFoundPos != std::string::npos) {
            std::string sToken = sWhat.substr(nPos, nFoundPos - nPos);
            vRet.push_back(sToken);
            nPos = nFoundPos + nDelimLen;
            if (nFoundPos + nDelimLen == nLen) { // last delimiter
                vRet.push_back("");
            }
        } else {
            std::string sToken = sWhat.substr(nPos, nLen - nPos);
            vRet.push_back(sToken);
            break;
        }
    }
    return vRet;
}

Unit-tests:

bool UnitTestSplit::run() {
bool bTestSuccess = true;

    struct LTest {
        LTest(
            const std::string &sStr,
            const std::string &sDelim,
            const std::vector<std::string> &vExpectedVector
        ) {
            this->sStr = sStr;
            this->sDelim = sDelim;
            this->vExpectedVector = vExpectedVector;
        };
        std::string sStr;
        std::string sDelim;
        std::vector<std::string> vExpectedVector;
    };
    std::vector<LTest> tests;
    tests.push_back(LTest("1 2 3 4 5", " ", {"1", "2", "3", "4", "5"}));
    tests.push_back(LTest("|1f|2п|3%^|44354|5kdasjfdre|2", "|", {"", "1f", "2п", "3%^", "44354", "5kdasjfdre", "2"}));
    tests.push_back(LTest("|1f|2п|3%^|44354|5kdasjfdre|", "|", {"", "1f", "2п", "3%^", "44354", "5kdasjfdre", ""}));
    tests.push_back(LTest("some1 => some2 => some3", "=>", {"some1 ", " some2 ", " some3"}));
    tests.push_back(LTest("some1 => some2 => some3 =>", "=>", {"some1 ", " some2 ", " some3 ", ""}));

    for (int i = 0; i < tests.size(); i++) {
        LTest test = tests[i];
        std::string sPrefix = "test" + std::to_string(i) + "(\"" + test.sStr + "\")";
        std::vector<std::string> vSplitted = WSJCppCore::split(test.sStr, test.sDelim);
        compareN(bTestSuccess, sPrefix + ": size", vSplitted.size(), test.vExpectedVector.size());
        int nMin = std::min(vSplitted.size(), test.vExpectedVector.size());
        for (int n = 0; n < nMin; n++) {
            compareS(bTestSuccess, sPrefix + ", element: " + std::to_string(n), vSplitted[n], test.vExpectedVector[n]);
        }
    }

    return bTestSuccess;
}
Hodge answered 13/3, 2020 at 17:19 Comment(0)
B
0
std::vector<std::string> parse(std::string str,std::string delim){
    std::vector<std::string> tokens;
    char *str_c = strdup(str.c_str()); 
    char* token = NULL;

    token = strtok(str_c, delim.c_str()); 
    while (token != NULL) { 
        tokens.push_back(std::string(token));  
        token = strtok(NULL, delim.c_str()); 
    }

    delete[] str_c;

    return tokens;
}
Belda answered 27/5, 2020 at 17:34 Comment(2)
Since you passed str by value, there's no need to call strdup. This is also the right use-case for emplace_back() rather than push_back.Sayles
strdup obtains memory with malloc. But you're freeing it with delete[].Sweeting
E
0
template<typename C, typename T>
auto insert_in_container(C& c, T&& t) -> decltype(c.push_back(std::forward<T>(t)), void()) {
    c.push_back(std::forward<T>(t));
}
template<typename C, typename T>
auto insert_in_container(C& c, T&& t) -> decltype(c.insert(std::forward<T>(t)), void()) {
    c.insert(std::forward<T>(t));
}
template<typename Container>
Container splitR(const std::string& input, const std::string& delims) {
    Container out;
    size_t delims_len = delims.size();
    auto begIdx = 0u;
    auto endIdx = input.find(delims, begIdx);
    if (endIdx == std::string::npos && input.size() != 0u) {
        insert_in_container(out, input);
    }
    else {
        size_t w = 0;
        while (endIdx != std::string::npos) {
            w = endIdx - begIdx;
            if (w != 0) insert_in_container(out, input.substr(begIdx, w));
            begIdx = endIdx + delims_len;
            endIdx = input.find(delims, begIdx);
        }
        w = input.length() - begIdx;
        if (w != 0) insert_in_container(out, input.substr(begIdx, w));
    }
    return out;
}
Earwitness answered 24/2, 2021 at 18:35 Comment(0)
T
0

i use pointer arithmetic. inner while for string delimeter if you satify with char delim just remove inner while simply. i hope it is correct. if you notice any mistake or improve please leave the comment.

std::vector<std::string> split(std::string s, std::string delim)
{
    char *p = &s[0];
    char *d = &delim[0];
    std::vector<std::string> res = {""};

    do
    {
        bool is_delim = true;
        char *pp = p;
        char *dd = d;
        while (*dd && is_delim == true)
            if (*pp++ != *dd++)
                is_delim = false;

        if (is_delim)
        {
            p = pp - 1;
            res.push_back("");
        }
        else
            *(res.rbegin()) += *p;
    } while (*p++);

    return res;
}
Truitt answered 8/5, 2021 at 0:58 Comment(1)
Welcome to Stack Overflow. While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.Buxom
E
0

A simpler solution would be -

You can use strtok to delimit on the basis of multichar delimiter. Remember to use strdup so that the orignal string isn't mutated.

#include <stdio.h>
#include <string.h>
const char* str = "scott>=tiger";
char *token = strtok(strdup(str), ">=");
while (token != NULL)
    {
        printf("%s\n", token);
        token = strtok(NULL, ">=");
    }
Ellerey answered 29/7, 2022 at 10:51 Comment(1)
If you have a string (which OP does) then copying it using its copy constructor will avoid a potentially expensive call to strlen. Surely we can also switch to using nullptr instead of NULL now that C++11 has been around for 11 years?Sayles
C
0

I looked through the answers and haven't seen an iterator based approach that can be fed into a range loop, so I made one.

This uses C++17 string_views so it shouldn't allocate copies of the string.

struct StringSplit
{
    struct Iterator
    {
        size_t tokenStart_ = 0;
        size_t tokenEnd_ = 0;
        std::string str_;
        std::string_view view_;
        std::string delimiter_;
        bool done_ = false;

        Iterator()
        {
            // End iterator.
            done_ = true;
        }

        Iterator(std::string str, std::string delimiter)
            : str_{std::move(str)}, view_{str_}, delimiter_{
                                                     std::move(delimiter)}
        {
            tokenEnd_ = view_.find(delimiter_, tokenStart_);
        }

        std::string_view operator*()
        {
            return view_.substr(tokenStart_, tokenEnd_ - tokenStart_);
        }

        Iterator &operator++()
        {
            if (tokenEnd_ == std::string::npos)
            {
                done_ = true;
                return *this;
            }

            tokenStart_ = tokenEnd_ + delimiter_.size();
            tokenEnd_ = view_.find(delimiter_, tokenStart_);
            return *this;
        }

        bool operator!=(Iterator &other)
        {
            // We only check if both points to the end.
            if (done_ && other.done_)
            {
                return false;
            }

            return true;
        }
    };

    Iterator beginIter_;

    StringSplit(std::string str, std::string delim)
        : beginIter_{std::move(str), std::move(delim)}
    {
    }

    Iterator begin()
    {
        return beginIter_;
    }

    Iterator end()
    {
        return Iterator{};
    }
};

And example usage would be:

int main()
{
    for (auto token : StringSplit{"<>foo<>bar<><>bar<><>baz<><>", "<>"})
    {
        std::cout << "TOKEN: '" << token << "'" << std::endl;
    }
}

Which prints:

TOKEN: ''
TOKEN: 'foo'
TOKEN: 'bar'
TOKEN: ''
TOKEN: 'bar'
TOKEN: ''
TOKEN: 'baz'
TOKEN: ''
TOKEN: ''

It properly handles empty entries at the beginning and end of the string.

Christenechristening answered 15/9, 2022 at 20:55 Comment(0)
F
0

Here is an example of splitting a string with another string using Boost String Algorithms library and Boost Range library. The solution is inspired with (modest) suggestion from the the StringAlgo library documentation, see the Split section.

Below is a complete program with the split_with_string function as well as comprehensive test - try it with godbolt:

#include <iostream>
#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>
#include <boost/range/iterator_range.hpp>

std::vector<std::string> split_with_string(std::string_view s, std::string_view search) 
{
    if (search.empty()) return {std::string{s}};

    std::vector<boost::iterator_range<std::string_view::iterator>> found;
    boost::algorithm::ifind_all(found, s, search);
    if (found.empty()) return {};

    std::vector<std::string> parts;
    parts.reserve(found.size() + 2); // a bit more

    std::string_view::iterator part_begin = s.cbegin(), part_end;
    for (auto& split_found : found)
    {
        // do not skip empty extracts
        part_end = split_found.begin();
        parts.emplace_back(part_begin, part_end);
        part_begin = split_found.end();
    }
    if (part_end != s.end())
        parts.emplace_back(part_begin, s.end());

    return parts;
}

#define TEST(expr) std::cout << ((!(expr)) ? "FAIL" : "PASS") << ": " #expr "\t" << std::endl

int main()
{
    auto s0 = split_with_string("adsf-+qwret-+nvfkbdsj", "");
    TEST(s0.size() == 1);
    TEST(s0.front() == "adsf-+qwret-+nvfkbdsj");
    auto s1 = split_with_string("adsf-+qwret-+nvfkbdsj", "-+");
    TEST(s1.size() == 3);
    TEST(s1.front() == "adsf");
    TEST(s1.back() == "nvfkbdsj");
    auto s2 = split_with_string("-+adsf-+qwret-+nvfkbdsj-+", "-+");
    TEST(s2.size() == 5);
    TEST(s2.front() == "");
    TEST(s2.back() == "");
    auto s3 = split_with_string("-+adsf-+qwret-+nvfkbdsj", "-+");
    TEST(s3.size() == 4);
    TEST(s3.front() == "");
    TEST(s3.back() == "nvfkbdsj");
    auto s4 = split_with_string("adsf-+qwret-+nvfkbdsj-+", "-+");
    TEST(s4.size() == 4);
    TEST(s4.front() == "adsf");
    TEST(s4.back() == "");
    auto s5 = split_with_string("dbo.abc", "dbo.");
    TEST(s5.size() == 2);
    TEST(s5.front() == "");
    TEST(s5.back() == "abc");
    auto s6 = split_with_string("dbo.abc", ".");
    TEST(s6.size() == 2);
    TEST(s6.front() == "dbo");
    TEST(s6.back() == "abc");
}

Tests output:

PASS: s0.size() == 1    
PASS: s0.front() == "adsf-+qwret-+nvfkbdsj" 
PASS: s1.size() == 3    
PASS: s1.front() == "adsf"  
PASS: s1.back() == "nvfkbdsj"   
PASS: s2.size() == 5    
PASS: s2.front() == ""  
PASS: s2.back() == ""   
PASS: s3.size() == 4    
PASS: s3.front() == ""  
PASS: s3.back() == "nvfkbdsj"   
PASS: s4.size() == 4    
PASS: s4.front() == "adsf"  
PASS: s4.back() == ""   
PASS: s5.size() == 2    
PASS: s5.front() == ""  
PASS: s5.back() == "abc"    
PASS: s6.size() == 2    
PASS: s6.front() == "dbo"   
PASS: s6.back() == "abc"    
Framboise answered 3/2, 2023 at 18:56 Comment(0)
E
0

Some answers lack a special case. If you have a csv where you want to read equal number of columns, the code fails for cases like this: Row1: a,b,c,d Row2: g,e,, For Row2 only 3 items are read

A special treatment at end of loop adds an empty string:

if (startIndex != str.size())
    result.emplace_back(str.begin() + startIndex, str.end());  
else if (result.size())     // min 1 separator found before. 
    result.emplace_back();

However it will not add a string if there is only 1 column without delim, which is filled in some rows with data and is empty for other rows

Entoblast answered 24/5, 2023 at 9:55 Comment(0)
D
0

Yet another.... This one should be easy to add features to over time without changing the function signature since I used "flags" rather than separate bool options.

utils.h

#include <string>
#include <vector>

namespace utils
{
    void ltrim( std::string &s );
    void rtrim( std::string &s );
    void trim(  std::string &s );
    
    enum SplitFlags
    {
        SPLIT_TRIMMED  = 0x01
    ,   SPLIT_NO_EMPTY = 0x02
    };
    std::vector<std::string> split(
        const std::string &s, const char delimiter, const int flags=0 );
}

utils.cpp

#include <sstream>
#include <algorithm>
#include <cctype>
#include <locale>

#include "utils.h"

void utils::ltrim( std::string &s )
{
    s.erase( s.begin(), std::find_if( s.begin(), s.end(),
        []( unsigned char ch ) { return !std::isspace( ch ); } ) );
}

void utils::rtrim( std::string &s )
{
    s.erase( std::find_if( s.rbegin(), s.rend(),
        []( unsigned char ch ) { return !std::isspace( ch ); } ).base(), s.end() );
}

void utils::trim( std::string &s )
{
    rtrim( s );
    ltrim( s );
}
    
std::vector<std::string> utils::split(
    const std::string &s, const char delimiter, const int flags )
{
    const bool trimmed( flags & SPLIT_TRIMMED  )
             , noEmpty( flags & SPLIT_NO_EMPTY )
    ;
    std::vector<std::string> tokens;
    std::stringstream ss( s );
    for( std::string t; getline( ss, t, delimiter ); )
    {
        if( trimmed ) trim( t );
        if( noEmpty && t.empty() ) continue;
        tokens.push_back( t );
    }
    return tokens;
}

Example use:

const auto parts( utils::split( 
    " , a g , b, c, ", ',', utils::SPLIT_TRIMMED | utils::SPLIT_NO_EMPTY ) );
Discouragement answered 26/6, 2023 at 15:45 Comment(0)
B
-1

As a bonus, here is a code example of a split function and macro that is easy to use and where you can choose the container type :

#include <iostream>
#include <vector>
#include <string>

#define split(str, delim, type) (split_fn<type<std::string>>(str, delim))
 
template <typename Container>
Container split_fn(const std::string& str, char delim = ' ') {
    Container cont{};
    std::size_t current, previous = 0;
    current = str.find(delim);
    while (current != std::string::npos) {
        cont.push_back(str.substr(previous, current - previous));
        previous = current + 1;
        current = str.find(delim, previous);
    }
    cont.push_back(str.substr(previous, current - previous));
    
    return cont;
}

int main() {
    
    auto test = std::string{"This is a great test"};
    auto res = split(test, ' ', std::vector);
    
    for(auto &i : res) {
        std::cout << i << ", "; // "this", "is", "a", "great", "test"
    }
    
    
    return 0;
}
Burmese answered 12/11, 2020 at 15:38 Comment(0)
B
-1

Since C++11 it can be done like this:

std::vector<std::string> splitString(const std::string& str,
                                     const std::regex& regex)
{
  return {std::sregex_token_iterator{str.begin(), str.end(), regex, -1}, 
          std::sregex_token_iterator() };
} 

// usually we have a predefined set of regular expressions: then
// let's build those only once and re-use them multiple times
static const std::regex regex1(R"some-reg-exp1", std::regex::optimize);
static const std::regex regex2(R"some-reg-exp2", std::regex::optimize);
static const std::regex regex3(R"some-reg-exp3", std::regex::optimize);

string str = "some string to split";
std::vector<std::string> tokens( splitString(str, regex1) ); 

Notes:

Bogosian answered 10/2, 2021 at 12:44 Comment(1)
This is an incomplete answer, not really doing or explaining anything.Stenson
K
-5
std::vector<std::string> split(const std::string& s, char c) {
  std::vector<std::string> v;
  unsigned int ii = 0;
  unsigned int j = s.find(c);
  while (j < s.length()) {
    v.push_back(s.substr(i, j - i));
    i = ++j;
    j = s.find(c, j);
    if (j >= s.length()) {
      v.push_back(s.substr(i, s,length()));
      break;
    }
  }
  return v;
}
Kegler answered 27/2, 2017 at 20:45 Comment(1)
Please be more accurate. Your code will not compile. See declaration of "i" and the comma instead of a dot.Vina

© 2022 - 2024 — McMap. All rights reserved.