C++/Boost: Writing a more powerful sscanf replacement
Asked Answered
S

2

7

I want to write a function in C++ to replace C's sscanf that assigns the matches to iterator.

Basically, I want something like:

string s = "0.5 6 hello";
std::vector<boost::any> any_vector;
sscanv(s, "%f %i %s", any_vector);
cout << "float: " << any_cast<float>(any_vector[0]);
cout << "integer: " << any_cast<integer(any_vector[1]);
cout << "string: " << any_cast<string>(any_vector[2]);

The exact details may vary, but you get the idea. Any ideas for implementation?

Options so far along with problems so far:

  • std::istringstream: there's no manipulator for matching constant expressions
  • Boost.Regex: not sure if this will work and it seems much more complicated than necessary for this
  • Boost.Spirit: don't think this will work for dynamically generated format strings and it also seems more complicated then necessary
  • sscanf: it would work, but is non-standard, etc, and using it would require a lot of overhead since the number of arguments is determined at compile time
Showman answered 17/2, 2011 at 22:4 Comment(11)
Would you be open to using c++0x concepts?Lecher
@Bradley: I thought they were removed from the draft for this standard?Redouble
@Bradley: definitely. I'm using gcc 4.5.1 and am compiling with -std=c++0x, so as long as it works with that I'm open to it. I don't want to wait for something that isn't yet available though. Of course, I'm still interested in hearing about it even if it's not available yet, but I need a solution for now.Showman
I guess variadic templates might be more convenient than vector<any> (something like in en.wikipedia.org/wiki/C%2B%2B0x#Variadic_templates), although parsing the format string is still the biggest question.Nephew
@UncleBens: Thanks, I'll look into those, but you're right in that they won't solve my primary problem.Showman
You could write a manipulator that matches constant expressions perhaps.Subdual
@Logan Capaldo: thanks, if I go that route, I'll do that, but I'd prefer something more intuitive and powerful.Showman
@Showman my suggestion was going to be variadic templates as well. Something like sscanv<float,int,string>(s, atuple) and suggest that maybe you could split the string and use boost::lexical_cast.Lecher
I think it should be made to work in a symmetrical fashion to boost::format, since boost::format is a type-safe replacement for printf.Renvoi
@Bradley: variadic templates look awesome, but I'd prefer something more dynamic (i.e. format string can be given at run time).Showman
@Emile: I agree and will try to look into whether something like that could be added to boost (of course, it would probably be a while).Showman
B
2

What's about that?

void sscanf(std::string str,
            const std::string& format,
            std::vector<boost::any>& result)
{
  std::string::const_iterator i = format.begin();
  while (i != format.end())
  {
    if (*i == '%')
    {
      ++i; // now *i is the conversion specifier
      char specifier = *i;

      ++i; // now *i is the next seperator
      std::string extract = str.substr(0, str.find(*i));

      switch (specifier) 
      {
        // matching an integer
        case 'i':
          result.push_back(boost::lexical_cast<int>(extract));
          break;
        // matching a floating point number
        case 'a': case 'e': case 'f': case 'g':
          result.push_back(boost::lexical_cast<float>(extract));
          break;
        // matching a single character
        case 'c':
          result.push_back(boost::lexical_cast<char>(extract));
          break;
        // matching a string
        case 's':
          result.push_back(extract);
          break;
        // Invalid conversion specifier, throwing an exception
        default:
          throw std::runtime_error("invalid conversion specifier");
          break;
      }
    }
    else
    {
      // if it's not a %, eat!
      str.erase(0, str.find(*i)+1);
      ++i;
    }
  }
}

Some conversions specifiers are missing – but principally it works.

Brewington answered 18/2, 2011 at 16:7 Comment(2)
thanks, this looks really good. How do you think this compares to scanf in terms of speed?Showman
@Showman lexical_cast, string::find, string::erase … are probably rather expensive – think it will be much slowerBrewington
S
2

If your format string is determined at compile time, there are some variadic-template printf replacements written. Inverting those should work reasonably well.

You could then use istream's >> operator for reading, or the c-stdlib functions.

Stribling answered 17/2, 2011 at 23:26 Comment(5)
That may be the best solution, but ideally I'd prefer something that could work on dynamic format strings (i.e. loaded at runtime), but I think this would only work if the format was specified at compile time, which is essentially the same limitation of sscanf.Showman
I see. Well, that shouldn't be too hard either. If your parsing needs are basic, I don't see why you can't use istream to read int/float/string out of the buffer, then stuff them in your any-vector (or store it in an output iterator, if you want to be fancy).Stribling
OTOH, if your format string is dynamic, the types and length of your vector will be dynamic too, so you'll have to be very generic with what you handle there, but perhaps your use-case and other code matches that. (I'm a bit curious on what you're trying to do... :)Stribling
Id say my parsing needs are more than basic but less than advanced. I basically want to replicate the functionality of sscanf, but make it more dynamic. I may end up using the istream option if no one has a better recommendation. Thanks.Showman
Oops, didn't see you last comment... yes my vector will have to be dynamic as well. The use case necessitates it (at least to get a generic solution). I plan on using Boost.Any or maybe making my own version based on that concept that fits my needs a little better. In short, there are a lot of "text files" of various formats that I want to parse and from which I want to create trend data. I'd prefer to write and compile once and then use configuration files to add support for new file formats rather than always having to add more code.Showman
B
2

What's about that?

void sscanf(std::string str,
            const std::string& format,
            std::vector<boost::any>& result)
{
  std::string::const_iterator i = format.begin();
  while (i != format.end())
  {
    if (*i == '%')
    {
      ++i; // now *i is the conversion specifier
      char specifier = *i;

      ++i; // now *i is the next seperator
      std::string extract = str.substr(0, str.find(*i));

      switch (specifier) 
      {
        // matching an integer
        case 'i':
          result.push_back(boost::lexical_cast<int>(extract));
          break;
        // matching a floating point number
        case 'a': case 'e': case 'f': case 'g':
          result.push_back(boost::lexical_cast<float>(extract));
          break;
        // matching a single character
        case 'c':
          result.push_back(boost::lexical_cast<char>(extract));
          break;
        // matching a string
        case 's':
          result.push_back(extract);
          break;
        // Invalid conversion specifier, throwing an exception
        default:
          throw std::runtime_error("invalid conversion specifier");
          break;
      }
    }
    else
    {
      // if it's not a %, eat!
      str.erase(0, str.find(*i)+1);
      ++i;
    }
  }
}

Some conversions specifiers are missing – but principally it works.

Brewington answered 18/2, 2011 at 16:7 Comment(2)
thanks, this looks really good. How do you think this compares to scanf in terms of speed?Showman
@Showman lexical_cast, string::find, string::erase … are probably rather expensive – think it will be much slowerBrewington

© 2022 - 2024 — McMap. All rights reserved.