How do I check that stream extraction has consumed all input?

Asked 7/11, 2012 at 16:56 Answered 10/4, 2015 at 14:31

In the following function, I try to see if a string s is convertible to type T by seeing if I can read a type T, and if the input is completely consumed afterwards. I want

template <class T>
bool can_be_converted_to(const std::string& s, T& t) 
{ 
  std::istringstream i(s);
  i>>std::boolalpha;
  i>>t;
  if (i and i.eof())
    return true;
  else
    return false;
}

However, can_be_converted_to<bool>("true") evaluates to false, because i.eof() is false at the end of the function.

This is correct, even though the function has read the entire string, because it hasn't attempted to read past the end of the string. (So, apparently this function works for int and double because istringstream reads past the end when reading these.)

So, assuming that I should indeed be checking (i and <input completely consumed>):

Q: How do I check that the input was completely consumed w/o using eof()?

Nissa answered 7/11, 2012 at 16:56 Comment(2)

Not an answer, just a note: consider to use a tmp-variable of type T as you would override t even in the case e.eof() is false. – Ptolemy 7/11, 2012 at 17:0

Do you get EOF returned if you do i.peek() – Lanlana 7/11, 2012 at 17:0

Use peek() or get() to check what's next in the stream:

return (i >> std::boolalpha >> t && i.peek() == EOF);

Your version doesn't work for integers, either. Consider this input: 123 45. It'll read 123 and report true, even though there are still some characters left in the stream.

Excalibur answered 7/11, 2012 at 17:1 Comment(0)

In many implementations of the standard library the eof will only be set after you tried reading beyond the end. You can verify that in your code by doing:

char _;
if (i && !(i >> _)) { // i is in a valid state, but
                      // reading a single extra char fails

Metamorphism answered 7/11, 2012 at 17:1 Comment(0)

Extending on jrok's answer, you can use i.get() just as easily as i.peek(), at least in this case. (I don't know if there is any reason to prefer one to the other.)

Also, following the convention that white space is never anything but a separator, you might want to extract it before checking for the end. Something like:

return i >> std::ws && i.get() == std::istream::traits_type::eof();

Some older implementations of std::ws were buggy, and would put the stream in an error state. In that case, you'd have to inverse the test, and do something like:

return !(i >> std::ws) || i.get() == std::istream::traits_type::eof();

Or just read the std::ws before the condition, and depend uniquely on the i.get().

(I don't know if buggy std::ws is still a problem. I developed a version of it that worked back when it was, and I've just continued to use it.)

Eggbeater answered 7/11, 2012 at 17:13 Comment(0)

I would like to offer a completely different approach: Take your input string, tokenise it yourself, and then convert the individual fields using boost::lexical_cast<T>.

Reason: I wasted an afternoon on parsing a string containing 2 int and 2 double fields, separated by spaces. Doing the following:

int i, j;
double x, y;
std::istringstream ins{str};

ins >> i >> j >> x >> y;
// how to check errors???...

parses the correct input such as

`"5 3 9.9e+01 5.5e+02"`

correctly, but does not detect the problem with this:

`"5 9.6e+01 5.5e+02"`

What happens is that i will be set to 5 (OK), j will be set to 9 (??), x to 6.0 (=0.6e+01), y to 550 (OK). I was quite surprised to see failbit not being set... (platform info: OS X 10.9, Apple Clang++ 6.0, C++11 mode).

Of course you can say now, "But wait, the Standard states that it should be so", and you may be right, but knowing that it is a feature rather than a bug does not reduce the pain if you want to do proper error checking without writing miles of code.

OTOH, if you use "Marius"'s excellent tokeniser function and split str first on whitespace then suddenly everything becomes very easy. Here is a slightly modified version of the tokeniser. I re-wrote it to return a vector of strings; the original is a template that puts the tokens in a container with elements convertible to strings. (For those who need such a generic approach please consult the original link above.)

// \param str: the input string to be tokenized
// \param delimiters: string of delimiter characters
// \param trimEmpty: if true then empty tokens will be trimmed
// \return a vector of strings containing the tokens
std::vector<std::string> tokenizer(
    const std::string& str,
    const std::string& delimiters = " ",
    const bool trimEmpty = false
) {
    std::vector<std::string> tokens;
    std::string::size_type pos, lastPos = 0;
    const char* strdata = str.data();
    while(true) {
        pos = str.find_first_of(delimiters, lastPos);
        if(pos == std::string::npos) {
            // no more delimiters
            pos = str.length();
            if(pos != lastPos || !trimEmpty) {
                tokens.emplace_back(strdata + lastPos, pos - lastPos);
            }
            break;
        } else {
            if(pos != lastPos || !trimEmpty) {
                tokens.emplace_back(strdata + lastPos, pos - lastPos);
            }
        }
        lastPos = pos + 1;
    }
    return tokens;
}

and then just use it like this (ParseError is some exception object):

std::vector<std::string> tokens = tokenizer(str, " \t", true);
if (tokens.size() < 4)
    throw ParseError{"Too few fields in " + str};

try {
    unsigned int i{ boost::lexical_cast<unsigned int>(tokens[0]) },
        j{ boost::lexical_cast<unsigned int>(tokens[1]) };
    double x{ boost::lexical_cast<double>(tokens[2]) },
        y{ boost::lexical_cast<double>(tokens[3]) };
    // print or process i, j, x, y ...
} catch(const boost::bad_lexical_cast& error) {
    throw ParseError{"Could not parse " + str};
}

Note: you can use the Boost split or the tokenizer if you wish, but they were slower than Marius' tokeniser (at least in my environment).

Update: Instead of boost::lexical_cast<T> you can use the C++11 "std::sto*" functions (e.g. stoi to convert a string token to an int). These throw two kinds of exceptions: std::invalid_argument if the conversion could not be performed and std::out_of_range if the converted value cannot be represented. You could either catch these separately or their parent std::runtime_error. Modifications to the example code above is left as an exercise to the reader :-)

Stoecker answered 10/4, 2015 at 14:31 Comment(0)

Recommended topics

Hot tags