c++ stringstream is too slow, how to speed up? [duplicate]
Asked Answered
D

4

26

Possible Duplicate:
Fastest way to read numerical values from text file in C++ (double in this case)

#include <ctime>
#include <cstdlib>
#include <string>
#include <sstream>
#include <iostream>
#include <limits>

using namespace std;

static const double NAN_D = numeric_limits<double>::quiet_NaN();

void die(const char *msg, const char *info)
{
    cerr << "** error: " << msg << " \"" << info << '\"';
    exit(1);
}

double str2dou1(const string &str)
{
    if (str.empty() || str[0]=='?') return NAN_D;
    const char *c_str = str.c_str();
    char *err;
    double x = strtod(c_str, &err);
    if (*err != 0) die("unrecognized numeric data", c_str);
    return x;
}

static istringstream string_to_type_stream;

double str2dou2(const string &str)
{
    if (str.empty() || str[0]=='?') return NAN_D;
    string_to_type_stream.clear();
    string_to_type_stream.str(str);
    double x = 0.0;
    if ((string_to_type_stream >> x).fail())
        die("unrecognized numeric data", str.c_str());
    return x;
}

int main()
{
    string str("12345.6789");

    clock_t tStart, tEnd;

    cout << "strtod: ";
    tStart=clock();

    for (int i=0; i<1000000; ++i)
        double x = str2dou1(str);

    tEnd=clock();
    cout << tEnd-tStart << endl;

    cout << "sstream: ";
    tStart=clock();

    for (int i=0; i<1000000; ++i)
        double x = str2dou2(str);

    tEnd=clock();
    cout << tEnd-tStart << endl;

    return 0;
}

strtod: 405
sstream: 1389

update: remove undersocres, env: win7+vc10

Donaldson answered 29/4, 2011 at 10:26 Comment(9)
Try to use boost::spirit instead of.Michaelamichaele
Those double-underscore names are illegal in user-written code. and if stringstreams are too slow for you - you have the answer - use strtod. stringstreams are primarily there for convenience and type-safety, not speed.Benedikta
The stream will collect the input and then eventually call strtold for the conversion. Makes it hard to be any faster!Sudorific
Which compiler is it? Maybe the stlport implementation of STL would be faster than one with comes with it (do not expect to beat strtod though, it's not possible).Forbidden
@unapersson double-underscore names were copyied from other place, lazy to modify themDonaldson
@hjbreg: just because a function is slower than another does not explain why you think it is "too slow" - do you really need it to be faster?Truscott
@Doc Brown I do, almost 100Mb raw data is to be convertedDonaldson
@hjbreg: which running time do you have now, which time do you try to achieve and which percent of the running time is spend in the function above?Truscott
@Doc Brown converting will be performed once, so just leave it as it is, but I wonder if there is any better solutionsDonaldson
O
13

C/C++ text to number formatting is very slow. Streams are horribly slow but even C number parsing is slow because it's quite difficult to get it correct down to the last precision bit.

In a production application where reading speed was important and where data was known to have at most three decimal digits and no scientific notation I got a vast improvement by hand-coding a floating parsing function handling only sign, integer part and any number of decimals (by "vast" I mean 10x faster compared to strtod).

If you don't need exponent and the precision of this function is enough this is the code of a parser similar to the one I wrote back then. On my PC it's now 6.8 times faster than strtod and 22.6 times faster than sstream.

double parseFloat(const std::string& input)
{
    const char *p = input.c_str();
    if (!*p || *p == '?')
        return NAN_D;
    int s = 1;
    while (*p == ' ') p++;

    if (*p == '-') {
        s = -1; p++;
    }

    double acc = 0;
    while (*p >= '0' && *p <= '9')
        acc = acc * 10 + *p++ - '0';

    if (*p == '.') {
        double k = 0.1;
        p++;
        while (*p >= '0' && *p <= '9') {
            acc += (*p++ - '0') * k;
            k *= 0.1;
        }
    }
    if (*p) die("Invalid numeric format");
    return s * acc;
}
Oversubscribe answered 29/4, 2011 at 13:2 Comment(0)
R
7

string stream is slow. Quite very slow. If you are writing anything performance critical that acts on large data sets ( say loading assets after a level change during a game ) do not use string streams. I recommend using the old school c library parsing functions for performance, although I cannot say how they compare to something like boost spirit.

However, compared to c library functions, string streams are very elegant, readable and reliable so if what you are doing is not performance ciritcal I recommend sticking to streams.

Roarke answered 29/4, 2011 at 12:16 Comment(0)
O
5

In general, if you need speed, consider this library:

http://www.fastformat.org/

(I'm not sure if it contains functions for converting strings or streams to other types, though, so it may not answer your current example).

For the record, please note you're comparing apples to oranges here. strtod() is a simple function that has a single purpose (converting strings to double), while stringstream is a much more complex formatting mechanism, which is far from being optimized to that specific purpose. A fairer comparison would be comparing stringstream to the sprintf/sscanf line of functions, which would be slower than strtod() but still faster than stringstream. I'm not exactly sure what makes stringstream's design slower than sprintf/sscanf, but it seems like that's the case.

Optometer answered 29/4, 2011 at 10:30 Comment(5)
why STL is slower than fastformatDonaldson
@hjbreg, because it has to support locales.Neolamarckism
@hjbreg: There are several reasons for that. Part of it may be related to streams design considerations and unoptimized implementation. Another reason is STL's flexibility, which probably includes locale support and support for IO manipulators (I'm not sure if fastformat have these or something equivalent).Optometer
strtod is not particularly simple. It handles locales (for example thousand separator vs decimal point, and nitpicking the thousands separators), and rounds very very carefully. What makes iostreams slow is an insane level of virtual function dispatching, and a difficulty-of-correctness issue that leads implementers to forgo any optimization completely.Reverse
@hjbreg: fastformat has locale support (in addition to an awful lot of screaming propaganda)Standoffish
M
2

Have you considered using lexical_cast from boost?

http://www.boost.org/doc/libs/1_46_1/libs/conversion/lexical_cast.htm

Edit: btw, the clear() should be redundant.

Malayoindonesian answered 29/4, 2011 at 10:29 Comment(14)
thanks, but I think STL is not designed to be slowDonaldson
+1 for lexical_cast but since it’s using a string stream underneath, it won’t be any faster.Supraliminal
translate.google.com/…Michaelamichaele
In my test lexical_cast is sooo slowMichaelamichaele
@Donaldson Uh? It isn't. And stringstream isn't a replacement for strtod. And there isn't an equivalent for strtod in STL. That's why lexical_cast was added to boost (which is basically extended STL).Minette
@w55tkqburu28q4xv Where's the code?Minette
@Let_Me_Be It is also very slow when spliting string, such as stream >> str, and stringstream is designed for what?Donaldson
Btw. If you want to format something more complex then an single item, there is Boost Format library boost.org/doc/libs/1_46_1/libs/formatMinette
@Let_Me_Be I will try to find it at home eveningMichaelamichaele
@Let_Me_Be: While I personally like Boost.Format, I'm afraid it's even slower than stringstream in most cases, so it's not really speedier alternative to iostreams. In fact, I think it uses iostreams internally in some cases.Optometer
@Let_Me_Be: lexical_cast is implemented using a stringstream ;-)Magdalen
@Let_Me_Be, sorry for delay - github.com/sergey-miryanov/test_asioMichaelamichaele
boost lexical_cast is much faster than std::stringstream. Here's the performance comparison: boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/…Bibliotaph
@JasperBekkers only in the fall-back case. It's specialized for many primitive typesStandoffish

© 2022 - 2024 — McMap. All rights reserved.