How to parse space-separated floats in C++ quickly?

Asked 4/7, 2013 at 8:10 Answered 23/11, 2019 at 23:14

I have a file with millions of lines, each line has 3 floats separated by spaces. It takes a lot of time to read the file, so I tried to read them using memory mapped files only to find out that the problem is not with the speed of IO but with the speed of the parsing.

My current parsing is to take the stream (called file) and do the following

float x,y,z;
file >> x >> y >> z;

Someone in Stack Overflow recommended to use Boost.Spirit but I couldn't find any simple tutorial to explain how to use it.

I'm trying to find a simple and efficient way to parse a line that looks like this:

"134.32 3545.87 3425"

I will really appreciate some help. I wanted to use strtok to split it, but I don't know how to convert strings to floats, and I'm not quite sure it's the best way.

I don't mind if the solution will be Boost or not. I don't mind if it won't be the most efficient solution ever, but I'm sure that it is possible to double the speed.

Thanks in advance.

Squilgee answered 4/7, 2013 at 8:10 Comment(11)

Why not switch to a binary file format, if you're so concerned about speed? – Threshold 4/7, 2013 at 8:12

Did you try just using fscanf ? – Pythoness 4/7, 2013 at 8:12

I can't switch to binary format because that is the input that i have. – Squilgee 4/7, 2013 at 8:14

You might be able to get a "good enough" solution just by coding the powers of 10 by hand, but if you want a 100% correct solution (down to the last digit) it might be pretty darn hard. – Humanism 4/7, 2013 at 8:16

Boost spirit can be slow too – Chiron 4/7, 2013 at 8:16

It would be interesting to actually see a comparison of the C++ (std::istream/spirit) with the C (scanf). (I don't see an obvious reason why C++-based solutions would be slower; but that is my prejudice) – Rothko 4/7, 2013 at 8:22

@Rothko take a look at this thread: #9371738 – Scraggly 4/7, 2013 at 8:32

And this #17468588 – Chiron 4/7, 2013 at 11:17

@log0, interesting thread about cin (and the synchronization of cin), but this question does not involve cin nor involves flushing the stream (doctorlove pointed thread). The best explanation I read so far (on why fscanf and strtod are faster) is a comment by JamesKanze in JeffFoster answer: "...both are locale sensitive---the main reason fscanf and >> have such different performance is because the C++ locale is much more awkward to use efficiently." – Rothko 4/7, 2013 at 20:8

I celebrate my return as polar bear by bringing you a comprehensive benchmark of float3 file parsers... with a surprising result (at least, to me) https://mcmap.net/q/242734/-how-to-parse-space-separated-floats-in-c-quickly – Diction 5/7, 2013 at 0:55

more "complete" high speed str->float routine below. crack_atof : https://mcmap.net/q/242734/-how-to-parse-space-separated-floats-in-c-quickly new record 1.327s for 11,000,000 lines on old Core i7 2600. See below. – Schoolfellow 24/11, 2019 at 0:8

If the conversion is the bottle neck (which is quite possible), you should start by using the different possiblities in the standard. Logically, one would expect them to be very close, but practically, they aren't always:

You've already determined that std::ifstream is too slow.
Converting your memory mapped data to an std::istringstream is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.
Writing your own streambuf to read directly from the memory, without copying (or using the deprecated std::istrstream) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.
You can always try fscanf, or scanf on your memory mapped stream. Depending on the implementation, they might be faster than the various istream implementations.
Probably faster than any of these is to use strtod. No need to tokenize for this: strtod skips leading white space (including '\n'), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:

    char* begin;    //  Set to point to the mmap'ed data...
                    //  You'll also have to arrange for a '\0'
                    //  to follow the data.  This is probably
                    //  the most difficult issue.
    char* end;
    errno = 0;
    double tmp = strtod( begin, &end );
    while ( errno == 0 && end != begin ) {
        //  do whatever with tmp...
        begin = end;
        tmp = strtod( begin, &end );
    }

If none of these are fast enough, you'll have to consider the actual data. It probably has some sort of additional constraints, which means that you can potentially write a conversion routine which is faster than the more general ones; e.g. strtod has to handle both fixed and scientific, and it has to be 100% accurate even if there are 17 significant digits. It also has to be locale specific. All of this is added complexity, which means added code to execute. But beware: writing an efficient and correct conversion routine, even for a restricted set of input, is non-trivial; you really do have to know what you are doing.

EDIT:

Just out of curiosity, I've run some tests. In addition to the afore mentioned solutions, I wrote a simple custom converter, which only handles fixed point (no scientific), with at most five digits after the decimal, and the value before the decimal must fit in an int:

double
convert( char const* source, char const** endPtr )
{
    char* end;
    int left = strtol( source, &end, 10 );
    double results = left;
    if ( *end == '.' ) {
        char* start = end + 1;
        int right = strtol( start, &end, 10 );
        static double const fracMult[] 
            = { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
        results += right * fracMult[ end - start ];
    }
    if ( endPtr != nullptr ) {
        *endPtr = end;
    }
    return results;
}

(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)

The interface is exactly that of strtod, to simplify coding.

I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:

Under Windows 7, compiled with VC 11 (/O2):

Testing Using fstream directly (5 iterations)...
    6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
    685800 microseconds per iteration
Testing Using strtod (5 iterations)...
    597000 microseconds per iteration
Testing Using manual (5 iterations)...
    269600 microseconds per iteration

Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):

Testing Using fstream directly (5 iterations)...
    784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
    526000 microseconds per iteration
Testing Using strtod (5 iterations)...
    382000 microseconds per iteration
Testing Using strtof (5 iterations)...
    360000 microseconds per iteration
Testing Using manual (5 iterations)...
    186000 microseconds per iteration

In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000).

The most striking thing is the enormous difference between fstream and fscan under Windows (and the relatively small difference between fscan and strtod). The second thing is just how much the simple custom conversion function gains, on both platforms. The necessary error handling would slow it down a little, but the difference is still significant. I expected some improvement, since it doesn't handle a lot of things the the standard conversion routines do (like scientific format, very, very small numbers, Inf and NaN, i18n, etc.), but not this much.

Decreasing answered 4/7, 2013 at 8:45 Comment(5)

This answer gave me the best performance so i marked it as the accepted answer. I feel obligated to say that Jeff foster's answer was very helpful also – Squilgee 4/7, 2013 at 9:15

@OopsUser, so what is the speed for how many lines/triplets? 9 seconds with >>, 4.5 seconds with fscanf, and how long for strtod? – Rothko 4/7, 2013 at 20:11

To my surprise, I've found Spirit to be considerably faster when parsing memory-mapped data. I benchmarked it alongside this approach, as well as the fscanf solution in my answer. – Diction 5/7, 2013 at 1:10

I've just edited my answer to include the results of my own benchmarks. – Decreasing 5/7, 2013 at 9:6

more "complete" high speed str->float routine below. crack_atof : https://mcmap.net/q/242734/-how-to-parse-space-separated-floats-in-c-quickly – Schoolfellow 24/11, 2019 at 0:7

UPDATE

Since Spirit X3 is available for testing, I've updated the benchmarks. Meanwhile I've used Nonius to get statistically sound benchmarks.

All charts below are available interactive online

Benchmark CMake project + testdata used is on github: https://github.com/sehe/bench_float_parsing

Summary:

Spirit parsers are fastest. If you can use C++14 consider the experimental version Spirit X3:

The above is measures using memory mapped files. Using IOstreams, it will be slower accross the board,

but not as slow as scanf using C/POSIX FILE* function calls:

What follows is parts from the OLD answer

I implemented the Spirit version, and ran a benchmark comparing to the other suggested answers.

Here's my results, all tests run on the same body of input (515Mb of input.txt). See below for exact specs.

^{(wall clock time in seconds, average of 2+ runs)}

To my own surprise, Boost Spirit turns out to be fastest, and most elegant:
handles/reports errors

supports +/-Inf and NaN and variable whitespace

no problems at all detecting the end of input (as opposed to the other mmap answer)
looks nice:
bool ok = phrase_parse(f,l,               // source iterators
     (double_ > double_ > double_) % eol, // grammar
     blank,                               // skipper
     data);                               // output attribute
Note that boost::spirit::istreambuf_iterator was unspeakably much slower (15s+). I hope this helps!

Benchmark details

All parsing done into vector of struct float3 { float x,y,z; }.

Generate input file using
od -f -A none --width=12 /dev/urandom | head -n 11000000
This results in a 515Mb file containing data like
     -2627.0056   -1.967235e-12  -2.2784738e+33
  -1.0664798e-27  -4.6421956e-23   -6.917859e+20
  -1.1080849e+36   2.8909405e-33   1.7888695e-12
  -7.1663235e+33  -1.0840628e+36   1.5343362e-12
  -3.1773715e-17  -6.3655537e-22   -8.797282e+31
    9.781095e+19   1.7378472e-37        63825084
  -1.2139188e+09  -5.2464635e-05  -2.1235992e-38
   3.0109424e+08   5.3939846e+30  -6.6146894e-20
Compile the program using:
g++ -std=c++0x -g -O3 -isystem -march=native test.cpp -o test -lboost_filesystem -lboost_iostreams
Measure wall clock time using
time ./test < input.txt 

Environment:

Linux desktop 4.2.0-42-generic #49-Ubuntu SMP x86_64
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
32GiB RAM

Full Code

Full code to the old benchmark is in the edit history of this post, the newest version is on github

Diction answered 5/7, 2013 at 0:49 Comment(10)

@LightnessRacesinOrbit Why yikes? The wall clock time is the relevant measure (of course, "wall clock" is figurative speech to make sure you understand we mean total elapsed time, not system time nor CPU time. It's benchmark jargon.) Feel free to improve on the benchmark presentation! – Diction 5/7, 2013 at 7:20

@sehe: I read "wall time" as elapsed system time. I suppose you deliberately used that rather than CPU time so as to measure I/O activities along with everything else, but then you're also measuring time used by other processes. – Plourde 5/7, 2013 at 7:32

@sehe: How many runs did you actually perform? Presumably more than 2?! For a good benchmark, despite the relatively large input & timescale. – Plourde 5/7, 2013 at 7:32

(Note that I find this answer interesting and do not dispute the spirit [sic] of its results!) – Plourde 5/7, 2013 at 7:34

@LightnessRacesinOrbit I think I ended up running it at least 50 times (over ten for each scenario). Yes I'm sleep deprived right now. I just averaged 2 numbers for the actual result sheet. Not that there was any deviation of significance between runs... – Diction 5/7, 2013 at 7:49

let us continue this discussion in chat – Diction 5/7, 2013 at 9:7

I noticed that the new version of spirit, boost::spirit::x3 (available with newer boost distributions), is about 30% faster for the vectored phrase_parse(f,l,parser,blank, data); call on my machine for similar data (I have ~100 floats on a line instead of 3, and parse it with ..., double_ % blank, eol, data)). – Parasiticide 6/11, 2016 at 0:45

@Parasiticide Thanks for the nudge. I've updated my answer with modern tools, including Spirit X3. I hope you like the graphs. Source on github. – Diction 6/11, 2016 at 22:49

Darn. The new-style Plotly graphs in Nonius 1.2.0beta are a lot nicer. – Diction 7/11, 2016 at 2:37

You've already determined that std::ifstream is too slow.
Converting your memory mapped data to an std::istringstream is almost certainly not a good solution; you'll first have to create a string, which will copy all of the data.
Writing your own streambuf to read directly from the memory, without copying (or using the deprecated std::istrstream) might be a solution, although if the problem really is the conversion... this still uses the same conversion routines.
You can always try fscanf, or scanf on your memory mapped stream. Depending on the implementation, they might be faster than the various istream implementations.
Probably faster than any of these is to use strtod. No need to tokenize for this: strtod skips leading white space (including '\n'), and has an out parameter where it puts the address of the first character not read. The end condition is a bit tricky, your loop should probably look a bit like:

    char* begin;    //  Set to point to the mmap'ed data...
                    //  You'll also have to arrange for a '\0'
                    //  to follow the data.  This is probably
                    //  the most difficult issue.
    char* end;
    errno = 0;
    double tmp = strtod( begin, &end );
    while ( errno == 0 && end != begin ) {
        //  do whatever with tmp...
        begin = end;
        tmp = strtod( begin, &end );
    }

EDIT:

double
convert( char const* source, char const** endPtr )
{
    char* end;
    int left = strtol( source, &end, 10 );
    double results = left;
    if ( *end == '.' ) {
        char* start = end + 1;
        int right = strtol( start, &end, 10 );
        static double const fracMult[] 
            = { 0.0, 0.1, 0.01, 0.001, 0.0001, 0.00001 };
        results += right * fracMult[ end - start ];
    }
    if ( endPtr != nullptr ) {
        *endPtr = end;
    }
    return results;
}

(If you actually use this, you should definitely add some error handling. This was just knocked up quickly for experimental purposes, to read the test file I'd generated, and nothing else.)

The interface is exactly that of strtod, to simplify coding.

I ran the benchmarks in two environments (on different machines, so the absolute values of any times aren't relevant). I got the following results:

Under Windows 7, compiled with VC 11 (/O2):

Testing Using fstream directly (5 iterations)...
    6.3528e+006 microseconds per iteration
Testing Using fscan directly (5 iterations)...
    685800 microseconds per iteration
Testing Using strtod (5 iterations)...
    597000 microseconds per iteration
Testing Using manual (5 iterations)...
    269600 microseconds per iteration

Under Linux 2.6.18, compiled with g++ 4.4.2 (-O2, IIRC):

Testing Using fstream directly (5 iterations)...
    784000 microseconds per iteration
Testing Using fscanf directly (5 iterations)...
    526000 microseconds per iteration
Testing Using strtod (5 iterations)...
    382000 microseconds per iteration
Testing Using strtof (5 iterations)...
    360000 microseconds per iteration
Testing Using manual (5 iterations)...
    186000 microseconds per iteration

In all cases, I'm reading 554000 lines, each with 3 randomly generated floating point in the range [0...10000).

Decreasing answered 4/7, 2013 at 8:45 Comment(5)

This answer gave me the best performance so i marked it as the accepted answer. I feel obligated to say that Jeff foster's answer was very helpful also – Squilgee 4/7, 2013 at 9:15

@OopsUser, so what is the speed for how many lines/triplets? 9 seconds with >>, 4.5 seconds with fscanf, and how long for strtod? – Rothko 4/7, 2013 at 20:11

I've just edited my answer to include the results of my own benchmarks. – Decreasing 5/7, 2013 at 9:6

more "complete" high speed str->float routine below. crack_atof : https://mcmap.net/q/242734/-how-to-parse-space-separated-floats-in-c-quickly – Schoolfellow 24/11, 2019 at 0:7

Before you start, verify that this is the slow part of your application and get a test harness around it so you can measure improvements.

boost::spirit would be overkill for this in my opinion. Try fscanf

FILE* f = fopen("yourfile");
if (NULL == f) {
   printf("Failed to open 'yourfile'");
   return;
}
float x,y,z;
int nItemsRead = fscanf(f,"%f %f %f\n", &x, &y, &z);
if (3 != nItemsRead) {
   printf("Oh dear, items aren't in the right format.\n");
   return;
}

Sigmon answered 4/7, 2013 at 8:12 Comment(7)

Sorry for the noob question, but how i loop through the file, can i do something like while(!f.eof()) ? – Squilgee 4/7, 2013 at 8:20

Error handling should not be omitted when replying to beginners. – Urochrome 4/7, 2013 at 8:23

@OopsUser: No, that's a bad idea. The better idea is to check first if your read worked (read three floats). If it didn't, there are two likely causes: a format error or EOF. Only at that point should you check f.eof() – Linoel 4/7, 2013 at 8:30

Thank you very much My current code reads a 15 MB file which contains 554,000 points(lines), in 4.5 seconds instead of 9 second with the original parsing. If i use just ifstream and then file.getLine(), then i takes only 0.9 seconds, so still most of the speed goes on the parsing – Squilgee 4/7, 2013 at 8:39

@Squilgee Effective parsing doubles is distinctly non-trivial, and will take time. Remember that both >> from a file and fscanf have to handle both scientific format and fixed, and that both are locale sensitive---the main reason fscanf and >> have such different performance is because the C++ locale is much more awkward to use efficiently. (Awkward, but not impossible. But most implementations seem content to use the most obvious solution, even if it is significantly slower.) – Decreasing 4/7, 2013 at 8:50

@Squilgee If you're happy with the 0.9 seconds using file.getline(), then you might try that, and the loop in my answer on each line. (file.getline() returns a '\0' terminated string, so the problem of getting a '\0' at the end of mmap'ed data isn't present.) It will probably be faster than fscanf, but don't expect any miracles. – Decreasing 4/7, 2013 at 8:57

@James Kanze, I used ifstream and then file.GetLine and then strtod to get the floats (i don't have strtof although it exist in c++ 11). I took instead 4.5 only 3.6 seconds to read 554,000 points (x,y,z). – Squilgee 4/7, 2013 at 9:12

EDIT: For those worried about crack_atof not being validated in any way, please see comments at bottom about Ryu.

Here is a more complete (although not "official" by any standard) high speed string to double routine, since the nice C++17 from_chars() solution only works on MSVC (not clang or gcc).

Meet crack_atof

https://gist.github.com/oschonrock/a410d4bec6ec1ccc5a3009f0907b3d15

Not my work, I just refactored it slightly. And changed the signature. The code is very easy to understand, and it's obvious why it's fast. And it's very very fast, see benchmarks here:

https://www.codeproject.com/Articles/1130262/Cplusplus-string-view-Conversion-to-Integral-Types

I ran it with 11,000,000 lines of 3 floats (15digit precision in the csv, which matters!). On my aged 2nd Gen Core i7 2600 it ran in 1.327s. Compiler clang V8.0.0 -O2 on Kubuntu 19.04.

Full code below. I am using mmap, because str->float is not the only bottleneck anymore thanks to crack_atof. I have wrapped the mmap stuff into a class to ensure RAII release of the map.


#include <iomanip>
#include <iostream>

// for mmap:
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>

class MemoryMappedFile {
public:
  MemoryMappedFile(const char* filename) {
    int fd = open(filename, O_RDONLY);
    if (fd == -1) throw std::logic_error("MemoryMappedFile: couldn't open file.");

    // obtain file size
    struct stat sb;
    if (fstat(fd, &sb) == -1) throw std::logic_error("MemoryMappedFile: cannot stat file size");
    m_filesize = sb.st_size;

    m_map = static_cast<const char*>(mmap(NULL, m_filesize, PROT_READ, MAP_PRIVATE, fd, 0u));
    if (m_map == MAP_FAILED) throw std::logic_error("MemoryMappedFile: cannot map file");
  }

  ~MemoryMappedFile() {
    if (munmap(static_cast<void*>(const_cast<char*>(m_map)), m_filesize) == -1)
      std::cerr << "Warnng: MemoryMappedFile: error in destructor during `munmap()`\n";
  }

  const char* start() const { return m_map; }
  const char* end() const { return m_map + m_filesize; }

private:
  size_t m_filesize = 0;
  const char* m_map = nullptr;
};

// high speed str -> double parser
double pow10(int n) {
  double ret = 1.0;
  double r   = 10.0;
  if (n < 0) {
    n = -n;
    r = 0.1;
  }

  while (n) {
    if (n & 1) {
      ret *= r;
    }
    r *= r;
    n >>= 1;
  }
  return ret;
}

double crack_atof(const char* start, const char* const end) {
  if (!start || !end || end <= start) {
    return 0;
  }

  int sign         = 1;
  double int_part  = 0.0;
  double frac_part = 0.0;
  bool has_frac    = false;
  bool has_exp     = false;

  // +/- sign
  if (*start == '-') {
    ++start;
    sign = -1;
  } else if (*start == '+') {
    ++start;
  }

  while (start != end) {
    if (*start >= '0' && *start <= '9') {
      int_part = int_part * 10 + (*start - '0');
    } else if (*start == '.') {
      has_frac = true;
      ++start;
      break;
    } else if (*start == 'e') {
      has_exp = true;
      ++start;
      break;
    } else {
      return sign * int_part;
    }
    ++start;
  }

  if (has_frac) {
    double frac_exp = 0.1;

    while (start != end) {
      if (*start >= '0' && *start <= '9') {
        frac_part += frac_exp * (*start - '0');
        frac_exp *= 0.1;
      } else if (*start == 'e') {
        has_exp = true;
        ++start;
        break;
      } else {
        return sign * (int_part + frac_part);
      }
      ++start;
    }
  }

  // parsing exponent part
  double exp_part = 1.0;
  if (start != end && has_exp) {
    int exp_sign = 1;
    if (*start == '-') {
      exp_sign = -1;
      ++start;
    } else if (*start == '+') {
      ++start;
    }

    int e = 0;
    while (start != end && *start >= '0' && *start <= '9') {
      e = e * 10 + *start - '0';
      ++start;
    }

    exp_part = pow10(exp_sign * e);
  }

  return sign * (int_part + frac_part) * exp_part;
}

int main() {
  MemoryMappedFile map  = MemoryMappedFile("FloatDataset.csv");
  const char* curr      = map.start();
  const char* start     = map.start();
  const char* const end = map.end();

  uintmax_t lines_n = 0;
  int cnt              = 0;
  double sum           = 0.0;
  while (curr && curr != end) {
    if (*curr == ',' || *curr == '\n') {
      // std::string fieldstr(start, curr);
      // double field = std::stod(fieldstr);
      // m_numLines = 11000000 cnt=33000000 sum=16498294753551.9
      // real 5.998s

      double field = crack_atof(start, curr);
      // m_numLines = 11000000 cnt=33000000 sum=16498294753551.9
      // real 1.327s

      sum += field;
      ++cnt;
      if (*curr == '\n') lines_n++;
      curr++;
      start = curr;
    } else {
      ++curr;
    }
  }

  std::cout << std::setprecision(15) << "m_numLines = " << lines_n << " cnt=" << cnt
            << " sum=" << sum << "\n";
}

Code also on on a github gist:

https://gist.github.com/oschonrock/67fc870ba067ebf0f369897a9d52c2dd

Armanda answered 23/11, 2019 at 23:14 Comment(4)

crack_atof doesn't seem to be tested anywhere for accuracy and edge cases. I'd be reluctant to use it in production. – Bartholomeus 10/1, 2020 at 8:59

@EmileCormier That's correct, I agree However we now have Ryu: github.com/ulfjack/ryu The widely praised Double =>String part has been adopted into the MSVC implementation of <charconv>to_chars . The String => Double parsing is still newer (first committed in December 2019) but this is much easier and I hope that this will mature and be validated quickly. -- I am already using it. I have a wrapper in my lib which takes a string_view and uses <charconv>to|from_chars for ints/ For doubles it uses ryu directly for clang/ggc and the standard implementation for MSVC, – Schoolfellow 10/1, 2020 at 9:3

@EmileCormier I just reran my code above with Ryu instead of crack_atof. It's not quite as fast (but probably already more correct as you say). 1.995seconds. – Schoolfellow 10/1, 2020 at 9:30

Thanks for making me aware of Ryu! I've been wanting to use from_chars but it's not yet available on Clang/GCC. Ryu should serve as a nice fallback in the interim. – Bartholomeus 10/1, 2020 at 17:54

I would check out this related post Using ifstream to read floats or How do I tokenize a string in C++ particularly the posts related to C++ String Toolkit Library. I've used C strtok, C++ streams, Boost tokenizer and the best of them for the ease and use is C++ String Toolkit Library.

Minsk answered 7/7, 2016 at 19:49 Comment(0)

using C is going to be the fastest solution. ~~Split into tokens using strtok and then~~ convert to float with strtof. Or if you know the exact format use fscanf.

Scraggly answered 4/7, 2013 at 8:13 Comment(1)

Using strtok is not going to solve any problems (and if you're accessing memory mapped data directly, you can't use it, because the data will be read only). – Decreasing 4/7, 2013 at 8:52

a nitty-gritty solution would be to throw more cores at the problem, spawning multiple threads. If the bottleneck is just the CPU you can halve down the running time by spawning two threads (on multicore CPUs)

some other tips:

try to avoid parsing functions from library such boost and/or std. They are bloated with error checking conditions and much of the processing time is spent doing these checks. For just a couple conversions they are fine but fail miserably when it comes to process millions of values. If you already know that your data is well-formatted you can write (or find) a custom optimized C function which does only the data conversion
use a large memory buffer (let's say 10 Mbytes) in which you load chunks of your file and do the conversion on there
divide et impera: split your problem into smaller easier ones: preprocess your file, make it single line single float, split each line by the "." character and convert integers instead of float, then merge the two integers to create the float number

Korns answered 4/7, 2013 at 8:16 Comment(3)

He said that parsing was the bottleneck, not IO access. – Amos 4/7, 2013 at 8:17

Without parsing it 0.4 seconds to read 250,000 lines, with parsing it takes 4.5 seconds. i used boost mapped files, he suppose to read them as fast as possible. – Squilgee 4/7, 2013 at 8:18

I've reached 3.18s for 11,000,000 lines using my fastest solution. The speed difference of 62x could of course be entirely down to my computer being faster... :) – Diction 5/7, 2013 at 2:13

I believe one most important rule in the string processing is "read only once, one character at a time". It is always simpler, faster and more reliable, I think.

I made simple benchmark program to show how simple it is. My test says this code runs 40% faster than strtod version.

#include <iostream>
#include <sstream>
#include <iomanip>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <sys/time.h>

using namespace std;

string test_generate(size_t n)
{
    srand((unsigned)time(0));
    double sum = 0.0;
    ostringstream os;
    os << std::fixed;
    for (size_t i=0; i<n; ++i)
    {
        unsigned u = rand();
        int w = 0;
        if (u > UINT_MAX/2)
            w = - (u - UINT_MAX/2);
        else
            w = + (u - UINT_MAX/2);
        double f = w / 1000.0;
        sum += f;

        os << f;
        os << " ";
    }
    printf("generated %f\n", sum);
    return os.str();
}

void read_float_ss(const string& in)
{
    double sum = 0.0;
    const char* begin = in.c_str();
    char* end = NULL;
    errno = 0;
    double f = strtod( begin, &end );
    sum += f;

    while ( errno == 0 && end != begin )
    {
        begin = end;
        f = strtod( begin, &end );
        sum += f;
    }
    printf("scanned %f\n", sum);
}

double scan_float(const char* str, size_t& off, size_t len)
{
    static const double bases[13] = {
        0.0, 10.0, 100.0, 1000.0, 10000.0,
        100000.0, 1000000.0, 10000000.0, 100000000.0,
        1000000000.0, 10000000000.0, 100000000000.0, 1000000000000.0,
    };

    bool begin = false;
    bool fail = false;
    bool minus = false;
    int pfrac = 0;

    double dec = 0.0;
    double frac = 0.0;
    for (; !fail && off<len; ++off)
    {
        char c = str[off];
        if (c == '+')
        {
            if (!begin)
                begin = true;
            else
                fail = true;
        }
        else if (c == '-')
        {
            if (!begin)
                begin = true;
            else
                fail = true;
            minus = true;
        }
        else if (c == '.')
        {
            if (!begin)
                begin = true;
            else if (pfrac)
                fail = true;
            pfrac = 1;
        }
        else if (c >= '0' && c <= '9')
        {
            if (!begin)
                begin = true;
            if (pfrac == 0)
            {
                dec *= 10;
                dec += c - '0';
            }
            else if (pfrac < 13)
            {
                frac += (c - '0') / bases[pfrac];
                ++pfrac;
            }
        }
        else
        {
            break;
        }
    }

    if (!fail)
    {
        double f = dec + frac;
        if (minus)
            f = -f;
        return f;
    }

    return 0.0;
}

void read_float_direct(const string& in)
{
    double sum = 0.0;
    size_t len = in.length();
    const char* str = in.c_str();
    for (size_t i=0; i<len; ++i)
    {
        double f = scan_float(str, i, len);
        sum += f;
    }
    printf("scanned %f\n", sum);
}

int main()
{
    const int n = 1000000;
    printf("count = %d\n", n);

    string in = test_generate(n);    
    {
        struct timeval t1;
        gettimeofday(&t1, 0);
        printf("scan start\n");

        read_float_ss(in);

        struct timeval t2;
        gettimeofday(&t2, 0);
        double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
        elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
        printf("elapsed %.2fms\n", elapsed);
    }

    {
        struct timeval t1;
        gettimeofday(&t1, 0);
        printf("scan start\n");

        read_float_direct(in);

        struct timeval t2;
        gettimeofday(&t2, 0);
        double elapsed = (t2.tv_sec - t1.tv_sec) * 1000000.0;
        elapsed += (t2.tv_usec - t1.tv_usec) / 1000.0;
        printf("elapsed %.2fms\n", elapsed);
    }
    return 0;
}

Below is console output from i7 Mac Book Pro (compiled in XCode 4.6).

count = 1000000
generated -1073202156466.638184
scan start
scanned -1073202156466.638184
elapsed 83.34ms
scan start
scanned -1073202156466.638184
elapsed 53.50ms

Varioloid answered 4/7, 2013 at 12:47 Comment(3)

This doesn't parse exponents (314e-2 e.g.), doesn't parse NaN or infinity, doesn't handle whitespace (not even the newlines that were specified). I'm not sure I'd trust scan_float to parse accurate results from this starting point. – Diction 5/7, 2013 at 2:0

I ran my benchmark, correcting for the unsupported bits of input sed -i 's/e[-+][0-9][0-9]//g' and sed -i 's/nan/0.0/g' and adapting the code to match the rest of the benchmarks (i.e. parse whitespace...). I got around 1.84s. Note that the input was actually reduced to 408Mb (from 515Mb, a 21% reduction). Compensating for that would give 2.32s – Diction 5/7, 2013 at 2:1

Granted, this is somewhat faster than the Spirit version, but only by ~25% (or ~0.9s on a half-GiB input...). Not enough to warrant the limitations shown, IMO. Full disclosure: the program I used to measure this code: http://ideone.com/yFBlpF /cc @Squilgee – Diction 5/7, 2013 at 2:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

UPDATE

Summary:

Benchmark details

Environment:

Full Code

Recommended topics

Hot tags