boost::split pushes an empty string to the vector even with token_compress_on
Asked Answered
G

1

8

When the input string is blank, boost::split returns a vector with one empty string in it.

Is it possible to have boost::split return an empty vector instead?

MCVE:

#include <string>
#include <vector>
#include <boost/algorithm/string.hpp>

int main() {
    std::vector<std::string> result;
    boost::split(result, "", boost::is_any_of(","), boost::algorithm::token_compress_on);
    std::cout << result.size();
}

Output:

1

Desired output:

0
Guess answered 3/10, 2017 at 8:27 Comment(0)
I
4

Compression compresses adjacent delimiters, it does not avoid empty tokens.

If you consider the following, you can see why this works consistently:

Live On Coliru

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    for (std::string const& test : {
            "", "token", 
            ",", "token,", ",token", 
            ",,", ",token,", ",,token", "token,,"
        })
    {
        std::vector<std::string> result;
        boost::split(result, test, boost::is_any_of(","), boost::algorithm::token_compress_on);
        std::cout << "\n=== TEST: " << std::left << std::setw(8) << test << " === ";
        for (auto& tok : result)
            std::cout << std::quoted(tok, '\'') << " ";
    }
}

Prints

=== TEST:          === '' 
=== TEST: token    === 'token' 
=== TEST: ,        === '' '' 
=== TEST: token,   === 'token' '' 
=== TEST: ,token   === '' 'token' 
=== TEST: ,,       === '' '' 
=== TEST: ,token,  === '' 'token' '' 
=== TEST: ,,token  === '' 'token' 
=== TEST: token,,  === 'token' '' 

So, you might fix it by trimming delimiters from front and end and checking that the remaining input is non-empty:

Live On Coliru

#include <boost/algorithm/string.hpp>
#include <boost/utility/string_view.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    auto const delim = boost::is_any_of(",");

    for (std::string test : {
            "", "token", 
            ",", "token,", ",token", 
            ",,", ",token,", ",,token", "token,,"
        })
    {
        std::cout << "\n=== TEST: " << std::left << std::setw(8) << test << " === ";

        std::vector<std::string> result;

        boost::trim_if(test, delim);
        if (!test.empty())
            boost::split(result, test, delim, boost::algorithm::token_compress_on);

        for (auto& tok : result)
            std::cout << std::quoted(tok, '\'') << " ";
    }
}

Printing:

=== TEST:          === 
=== TEST: token    === 'token' 
=== TEST: ,        === 
=== TEST: token,   === 'token' 
=== TEST: ,token   === 'token' 
=== TEST: ,,       === 
=== TEST: ,token,  === 'token' 
=== TEST: ,,token  === 'token' 
=== TEST: token,,  === 'token' 

BONUS: Boost Spirit

Using Spirit X3, seems to me to be more flexible and potentially more efficient:

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <vector>

int main() {
    static auto const delim = boost::spirit::x3::char_(",");

    for (std::string test : {
            "", "token", 
            ",", "token,", ",token", 
            ",,", ",token,", ",,token", "token,,"
        })
    {
        std::cout << "\n=== TEST: " << std::left << std::setw(8) << test << " === ";

        std::vector<std::string> result;
        parse(test.begin(), test.end(), -(+~delim) % delim, result);

        for (auto& tok : result)
            std::cout << std::quoted(tok, '\'') << " ";
    }
}
Illdisposed answered 3/10, 2017 at 8:42 Comment(8)
Excellent example, however I don't understand, why you consider this to work consistently. Even more confusing is fact that boost doc says This function is equivalent to C strtok but strtok returns NULL for empty string.Peplos
@Peplos They must have meant functionally equivalent (for one thing, it doesn't modify its input :)). Also, it's consistent in that all tokens are always returned. compress just means that a new token starts after all adjacent delimiters. I guess they should have named it delimiter_compression_on insteadIlldisposed
Added a Proof-of-Concept of the workaroundIlldisposed
maybe, but still I would expect to return two empty strings for input , and empty vector for empty string (but the "workaround" for this is obvious).Peplos
Added an alternative using Spirit X3 that looks more elegant to me Live On ColiruIlldisposed
@Peplos I cannot think of a single possible consistent explanation for that behaviour you describe (if empty input results in no tokens, then surely one of the sides of "," must also get that treatment, so you'd get at most 1 token). Boost Split is consistent (which is probably also clear by looking at the implementation)Illdisposed
Thanks for the Spirit X3 tip, is that the successor or predecessor of Spirit Qi? Gotta marvel at the expressiveness of -(+~delim) % delim... What does ~ do, is that documented anywhere?Guess
It is. Docs: boost.org/doc/libs/1_65_1/libs/spirit/doc/html/spirit/qi/… (same for X3, strangely missing that bit of docu ciere.com/cppnow15/x3_docs/spirit/quick_reference/char.html).Illdisposed

© 2022 - 2024 — McMap. All rights reserved.