double to string without scientific notation or trailing zeros, efficiently
Asked Answered
M

3

13

This routine is called a zillion times to create large CSV files full of numbers. Is there a more efficient way to do this?

static std::string dbl2str(double d)
{
    std::stringstream ss;

    //convert double to string w fixed notation, hi precision
    ss << std::fixed << std::setprecision(10) << d;
    
    //output to std::string
    std::string s = ss.str();
                                
    //remove trailing 000s (123.1200 => 123.12,  123.000 => 123.)
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);

    //remove dangling decimal (123. => 123)
    return (s[s.size()-1] == '.') ? s.substr(0, s.size()-1) : s; 
}
Myrmecophagous answered 1/3, 2013 at 19:37 Comment(3)
The title seems wrong, should be double to string?Markley
oops - title is backwards . . . of course it's double to stringMyrmecophagous
Possible duplicate of Formatting n significant digits in C++ without scientific notationMesothelium
C
10

Before you start, check whether significant time is spent in this function. Do this by measuring, either with a profiler or otherwise. Knowing that you call it a zillion times is all very well, but if it turns out your program still only spends 1% of its time in this function, then nothing you do here can possibly improve your program's performance by more than 1%. If that were the case the answer to your question would be "for your purposes no, this function cannot be made significantly more efficient and you are wasting your time if you try".

First thing, avoid s.substr(0, s.size()-1). This copies most of the string and it makes your function ineligible for NRVO, so I think generally you'll get a copy on return. So the first change I'd make is to replace the last line with:

if(s[s.size()-1] == '.') {
    s.erase(s.end()-1);
}
return s;

But if performance is a serious concern, then here's how I'd do it. I'm not promising that this is the fastest possible, but it avoids some issues with unnecessary allocations and copying. Any approach involving stringstream is going to require a copy from the stringstream to the result, so we want a more low-level operation, snprintf.

static std::string dbl2str(double d)
{
    size_t len = std::snprintf(0, 0, "%.10f", d);
    std::string s(len+1, 0);
    // technically non-portable, see below
    std::snprintf(&s[0], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
    return s;
}

The second call to snprintf assumes that std::string uses contiguous storage. This is guaranteed in C++11. It is not guaranteed in C++03, but is true for all actively-maintained implementations of std::string known to the C++ committee. If performance really is important then I think it's reasonable to make that non-portable assumption, since writing directly into a string saves copying into a string later.

s.pop_back() is the C++11 way of saying s.erase(s.end()-1), and s.back() is s[s.size()-1]

For another possible improvement, you could get rid of the first call to snprintf and instead size your s to some value like std::numeric_limits<double>::max_exponent10 + 14 (basically, the length that -DBL_MAX needs). The trouble is that this allocates and zeros far more memory than is typically needed (322 bytes for an IEEE double). My intuition is that this will be slower than the first call to snprintf, not to mention wasteful of memory in the case where the string return value is kept hanging around for a while by the caller. But you can always test it.

Alternatively, std::max((int)std::log10(d), 0) + 14 computes a reasonably tight upper bound on the size needed, and might be quicker than snprintf can compute it exactly.

Finally, it may be that you can improve performance by changing the function interface. For example, instead of returning a new string you could perhaps append to a string passed in by the caller:

void append_dbl2str(std::string &s, double d) {
    size_t len = std::snprintf(0, 0, "%.10f", d);
    size_t oldsize = s.size();
    s.resize(oldsize + len + 1);
    // technically non-portable
    std::snprintf(&s[oldsize], len+1, "%.10f", d);
    // remove nul terminator
    s.pop_back();
    // remove trailing zeros
    s.erase(s.find_last_not_of('0') + 1, std::string::npos);
    // remove trailing point
    if(s.back() == '.') {
        s.pop_back();
    }
}

Then the caller can reserve() plenty of space, call your function several times (presumably with other string appends in between), and write the resulting block of data to the file all at once, without any memory allocation other than the reserve. "Plenty" doesn't have to be the whole file, it could be one line or "paragraph" at a time, but anything that avoids a zillion memory allocations is a potential performance boost.

Closegrained answered 1/3, 2013 at 21:31 Comment(0)
J
5

Efficient in terms of speed or brevity?

char buf[64];
sprintf(buf, "%-.*G", 16, 1.0);
cout << buf << endl;

Displays "1". Formats up to significant 16 digits, with no trailing zeros, before reverting to scientific notation.

Jumna answered 22/12, 2013 at 12:4 Comment(1)
The - isn't strictly necessary (it left justifies)Milepost
P
1
  • use snprintf and an array of char instead of stringstream and string
  • pass a pointer to char buffer to dbl2str into which it prints (in order to avoid the copy constructor of string called when returning). Assemble the string to be printed in a character buffer (or convert the char buffer when called to a string or add it to an existing string)
  • declare the function inline in a header file

    #include <cstdio>
    inline void dbl2str(char *buffer, int bufsize, double d)
    {
      /** the caller must make sure that there is enough memory allocated for buffer */
      int len = snprintf(buffer, bufsize, "%lf", d);
    
      /* len is the number of characters put into the buffer excluding the trailing \0
         so buffer[len] is the \0 and buffer[len-1] is the last 'visible' character */
    
      while (len >= 1 && buffer[len-1] == '0')
        --len;
    
      /* terminate the string where the last '0' character was or overwrite the existing
         0 if there was no '0' */
      buffer[len] = 0;
    
      /* check for a trailing decimal point */
      if (len >= 1 && buffer[len-1] == '.')
        buffer[len-1] = 0;
    }
    
Pussy answered 1/3, 2013 at 19:48 Comment(1)
Keyword inline does not directly affect optimization by "inlining", it is instruction for linker that this symbol may appear many times in linking and it's not an error. The function is already static in question.Markley

© 2022 - 2024 — McMap. All rights reserved.