Ways std::stringstream can set fail/bad bit?
Asked Answered
U

1

9

A common piece of code I use for simple string splitting looks like this:

inline std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    std::stringstream ss(s);
    std::string item;
    while(std::getline(ss, item, delim)) {
        elems.push_back(item);
    }
    return elems;
}

Someone mentioned that this will silently "swallow" errors occurring in std::getline. And of course I agree that's the case. But it occurred to me, what could possibly go wrong here in practice that I would need to worry about. basically it all boils down to this:

inline std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    std::stringstream ss(s);
    std::string item;
    while(std::getline(ss, item, delim)) {
        elems.push_back(item);
    }

    if(/* what error can I catch here? */) {
        // *** How did we get here!? ***
    }

    return elems;
}

A stringstream is backed by a string, so we don't have to worry about any of the issues associated with reading from a file. There is no type conversion going on here since getline simply reads until it sees the line delimeter or EOF. So we can't get any of the errors that something like boost::lexical_cast has to worry about.

I simply can't think of something besides failing to allocate enough memory that could go wrong, but that'll just throw a std::bad_alloc well before the std::getline even takes place. What am I missing?

Unfamiliar answered 1/4, 2010 at 19:3 Comment(3)
What is wrong is returning a reference to a local.Presume
Good catch, though I didn't mean to return a reference to a local, this is a cut down example to demonstrate the basics of the questionUnfamiliar
A stringstream is backed by a string only if you haven't called rdbuf(otherstreambuf).Luminosity
S
6

I can't imagine what errors this person thinks might happen, and you should ask them to explain. Nothing can go wrong except allocation errors, as you mentioned, which are thrown and not swallowed.

The only thing I see that you're directly missing is that ss.fail() is guaranteed to be true after the while loop, because that's the condition being tested. (bool(stream) is equivalent to !stream.fail(), not stream.good().) As expected, ss.eof() will also be true, indicating the failure was due to EOF.

However, there might be some confusion over what is actually happening. Because getline uses delim-terminated fields rather than delim-separated fields, input data such as "a\nb\n" has two instead of three fields, and this might be surprising. For lines this makes complete sense (and is POSIX standard), but how many fields, with a delim of '-', would you expect to find in "a-b-" after splitting?


Incidentally, here's how I'd write split:

template<class OutIter>
OutIter split(std::string const& s, char delim, OutIter dest) {
  std::string::size_type begin = 0, end;
  while ((end = s.find(delim, begin)) != s.npos) {
    *dest++ = s.substr(begin, end - begin);
    begin = end + 1;
  }
  *dest++ = s.substr(begin);
  return dest;
}

This avoids all of the problems with iostreams in the first place, avoids extra copies (the stringstream's backing string; plus the temp returned by substr can even use a C++0x rvalue reference for move semantics if supported, as written), has the behavior I expect from split (different from yours), and works with any container.

deque<string> c;
split("a-b-", '-', back_inserter(c));
// c == {"a", "b", ""}
Scoot answered 1/4, 2010 at 20:43 Comment(7)
good point about the use s.fail(), i suppose s.bad() would be a better choice? or perhaps !s.eof()? (it should end due to EOF, so if it isn't EOF, then it failed right?)Unfamiliar
Also, good point about terminated vs separated fields. I've never had an issue with that before, but I could see it being surprising. All the more reason to test the number of fields you got before extracting your data from the result.Unfamiliar
@Evan: First identify the condition you're trying to check. There's no need to check fail, bad, eof, or anything else on ss after the loop, but you might want to check elems, as you said.Scoot
well that check is currently hypothetical at the moment ;). Someone said this swallows an error, the question is, how would i catch this error? It seems that the error check simply isn't necessary, I just need to check that I got the amount i expected.Unfamiliar
@Evan: What error is it supposed to be swallowing? I can't find one, and am inclined to say they're flat-out wrong. (Unless they mean about -terminated vs -separated, but that's not an error in getline, it's a misunderstanding.)Scoot
It was in reference to this post I made: #2553339 The first comment claims that it silently swallows and error, so far no specifics have come from the discussion though.Unfamiliar
@Evan & @Roger: I stand corrected. I just looked at the code, saw the usage of streams and no error checking, an pavlov'ed on that. Now that I'm thinking more about it, I too see that extracting strings from a stream shouldn't fail (except for memory allocation errors). Paranoid as I am, I'd probably put assert() there (assert(ss.eof())?), but I wouldn#t blame others for not doing this. I'm sorry for the confusion. Anyway, it seems we all learned something from it and it earned you a few up-votes, so I guess in the end it's been a positive confusion. :)Laconia

© 2022 - 2024 — McMap. All rights reserved.