How can I improve/replace sprintf, which I've measured to be a performance hotspot?
Asked Answered
D

12

9

Through profiling I've discovered that the sprintf here takes a long time. Is there a better performing alternative that still handles the leading zeros in the y/m/d h/m/s fields?

SYSTEMTIME sysTime;
GetLocalTime( &sysTime );
char buf[80];
for (int i = 0; i < 100000; i++)
{

    sprintf(buf, "%4d-%02d-%02d %02d:%02d:%02d",
        sysTime.wYear, sysTime.wMonth, sysTime.wDay, 
        sysTime.wHour, sysTime.wMinute, sysTime.wSecond);

}

Note: The OP explains in the comments that this is a stripped-down example. The "real" loop contains additional code that uses varying time values from a database. Profiling has pinpointed sprintf() as the offender.

Department answered 7/11, 2008 at 12:51 Comment(2)
How long is a "long time"? I'd expect it to be in microseconds rather than milliseconds (depending on CPU)Xyloid
You'd be better off finding a way to call sprintf less frequently instead.Ormiston
N
21

If you were writing your own function to do the job, a lookup table of the string values of 0 .. 61 would avoid having to do any arithmetic for everything apart from the year.

edit: Note that to cope with leap seconds (and to match strftime()) you should be able to print seconds values of 60 and 61.

char LeadingZeroIntegerValues[62][] = { "00", "01", "02", ... "59", "60", "61" };

Alternatively, how about strftime()? I've no idea how the performance compares (it could well just be calling sprintf()), but it's worth looking at (and it could be doing the above lookup itself).

Niacin answered 7/11, 2008 at 13:9 Comment(5)
+1 from an old embedded engineer. When time is more critical than size, it's hard to beat a lookup table!Diagnostician
Nice. But be careful how you use this. Calling strcat() to append the strings is also likely to be ugly, performance-wise.Xyloid
Yeah, since you know they're all 2 characters you can just memcpy. Also, something that just occurred to me while looking at the strtime page, the array of strings should probably be up to "61", since strtime does that, presumably to account for leap seconds.Niacin
Actually, forget memcpy, you can just assign the characters.Niacin
I'd be inclined to call memcpy anyway - a decent compiler will replace the call with opcodes for the most efficient 2 byte copy. And if your compiler isn't decent, then trying to optimise code like this for speed is hopeless, you just have to write in assembler.Hultin
X
6

You could try filling each char in the output in turn.

buf[0] = (sysTime.wYear / 1000) % 10 + '0' ;
buf[1] = (sysTime.wYear / 100) % 10 + '0';
buf[2] = (sysTime.wYear / 10) % 10 + '0';
buf[3] = sysTime.wYear % 10 + '0';
buf[4] = '-';

... etc...

Not pretty, but you get the picture. If nothing else, it may help explain why sprintf isn't going to be that fast.

OTOH, maybe you could cache the last result. That way you'd only need to generate one every second.

Xyloid answered 7/11, 2008 at 13:1 Comment(1)
In real life, the time value is different every iteration of the loop (from a db). I just simplified the code for the example,Department
L
6

Printf needs to deal with a lot of different formats. You certainly could grab the source for printf and use it as a basis to roll your own version that deals specifically with the sysTime structure. That way you pass in one argument, and it does just exactly the work that needs to be done and nothing more.

Leshalesher answered 7/11, 2008 at 13:1 Comment(0)
G
3

Looks like Jaywalker is suggesting a very similar method (beat me by less than an hour).

In addition to the already suggested lookup table method (n2s[] array below), how about generating your format buffer so that the usual sprintf is less intensive? The code below will only have to fill in the minute and second every time through the loop unless the year/month/day/hour have changed. Obviously, if any of those have changed you do take another sprintf hit but overall it may not be more than what you are currently witnessing (when combined with the array lookup).


static char fbuf[80];
static SYSTEMTIME lastSysTime = {0, ..., 0};  // initialize to all zeros.

for (int i = 0; i < 100000; i++)
{
    if ((lastSysTime.wHour != sysTime.wHour)
    ||  (lastSysTime.wDay != sysTime.wDay)
    ||  (lastSysTime.wMonth != sysTime.wMonth)
    ||  (lastSysTime.wYear != sysTime.wYear))
    {
        sprintf(fbuf, "%4d-%02s-%02s %02s:%%02s:%%02s",
                sysTime.wYear, n2s[sysTime.wMonth],
                n2s[sysTime.wDay], n2s[sysTime.wHour]);

        lastSysTime.wHour = sysTime.wHour;
        lastSysTime.wDay = sysTime.wDay;
        lastSysTime.wMonth = sysTime.wMonth;
        lastSysTime.wYear = sysTime.wYear;
    }

    sprintf(buf, fbuf, n2s[sysTime.wMinute], n2s[sysTime.wSecond]);

}
Gaskin answered 7/11, 2008 at 14:0 Comment(2)
New to this... does my entire answer show up for anyone? To me it looks like only about 4 lines are visible.Gaskin
you need to use < for < and > for >Retaretable
D
2

What do you mean by a "long" time -- since the sprintf() is the only statement in your loop and the "plumbing" of the loop (increment, comparison) is negligible, the sprintf() has to consume the most time.

Remember the old joke about the man who lost his wedding ring on 3rd Street one night, but looked for it on 5th because the light was brighter there? You've built an example that's designed to "prove" your assumption that sprintf() is ineffecient.

Your results will be more accurate if you profile "actual" code that contains sprintf() in addition to all the other functions and algorithms you use. Alternatively, try writing your own version that addresses the specific zero-padded numeric conversion that you require.

You may be surprised at the results.

Diagnostician answered 7/11, 2008 at 13:7 Comment(2)
The code snippet is a simplified example. In the real example, there are many things in the loop. (char*) _bstr_t, itoa.... It's the sprint here that's bad.Department
Was it Einstein who said everything should be made as simple as possible, but no simpler? :-) Edited your question to reflect this.Diagnostician
R
2

How about caching the results? Isn't that a possibility? Considering that this particular sprintf() call is made too often in your code, I'm assuming that between most of these consecutive calls, the year, month and day do not change.

Thus, we can implement something like the following. Declare an old and a current SYSTEMTIME structure:

SYSTEMTIME sysTime, oldSysTime;

Also, declare separate parts to hold the date and the time:

char datePart[80];
char timePart[80];

For, the first time, you'll have to fill in both sysTime, oldSysTime as well as datePart and timePart. But subsequent sprintf()'s can be made quite faster as given below:

sprintf (timePart, "%02d:%02d:%02d", sysTime.wHour, sysTime.wMinute, sysTime.wSecond);
if (oldSysTime.wYear == sysTime.wYear && 
  oldSysTime.wMonth == sysTime.wMonth &&
  oldSysTime.wDay == sysTime.wDay) 
  {
     // we can reuse the date part
     strcpy (buff, datePart);
     strcat (buff, timePart);
  }
else {
     // we need to regenerate the date part as well
     sprintf (datePart, "%4d-%02d-%02d", sysTime.wYear, sysTime.wMonth, sysTime.wDay);
     strcpy (buff, datePart);
     strcat (buff, timePart);
}

memcpy (&oldSysTime, &sysTime, sizeof (SYSTEMTIME));

Above code has some redundancy to make the code easier to understand. You can factor out easily. You can further speed up if you know that even hour and minutes won't change faster than your call to the routine.

Rudolfrudolfo answered 7/11, 2008 at 13:7 Comment(0)
P
2

I would do a few things...

  • cache the current time so you don't have to regenerate the timestamp every time
  • do the time conversion manually. The slowest part of the printf-family functions is the format-string parsing, and it's silly to be devoting cycles to that parsing on every loop execution.
  • try using 2-byte lookup tables for all conversions ({ "00", "01", "02", ..., "99" }). This is because you want to avoid moduluar arithmetic, and a 2-byte table means you only have to use one modulo, for the year.
Portsalut answered 7/11, 2008 at 13:48 Comment(0)
A
2

The two fast formatters I've tested are FastFormat and Karma::generate (part of Boost Spirit).

You might also find it useful to benchmark it or at least look for existing benchmarks.

For example this one (though it's missing FastFormat):

Fast integer to string conversion in C++

Asphodel answered 9/5, 2018 at 19:3 Comment(0)
R
1

You would probably get w perf increase by hand rolling a routine that lays out the digits in the return buf, since you could avoid repeatedly parsing a format string and would not have to deal with a lot of the more complex cases sprintf handles. I am loathe to actually recommend doing that though.

I would recommend trying to figure out if you can somehow reduce the amount you need to generate these strings, are they optional somegtimes, can they be cached, etc.

Rianna answered 7/11, 2008 at 13:3 Comment(0)
S
1

I'm working on a similar problem at the moment.

I need to log debug statements with timestamp, filename, line number etc on an embedded system. We already have a logger in place but when I turn the knob to 'full logging', it eats all our proc cycles and puts our system in dire states, states that no computing device should ever have to experience.

Someone did say "You cannot measure/observe something without changing that which you are measuring/observing."

So I'm changing things to improve performance. The current state of things is that Im 2x faster than the original function call (the bottleneck in that logging system is not in the function call but in the log reader which is a separate executable, which I can discard if I write my own logging stack).

The interface I need to provide is something like- void log(int channel, char *filename, int lineno, format, ...). I need to append the channel name (which currently does a linear search within a list! For every single debug statement!) and timestamp including millisecond counter. Here are some of the things Im doing to make this faster-

  • Stringify channel name so I can strcpy rather than search the list. define macro LOG(channel, ...etc) as log(#channel, ...etc). You can use memcpy if you fix the length of the string by defining LOG(channel, ...) log("...."#channel - sizeof("...."#channel) + *11*) to get fixed 10 byte channel lengths
  • Generate timestamp string a couple of times a second. You can use asctime or something. Then memcpy the fixed length string to every debug statement.
  • If you want to generate the timestamp string in real time then a look up table with assignment (not memcpy!) is perfect. But that works only for 2 digit numbers and maybe for the year.
  • What about three digits (milliseconds) and five digits (lineno)? I dont like itoa and I dont like the custom itoa (digit = ((value /= value) % 10)) either because divs and mods are slow. I wrote the functions below and later discovered that something similar is in the AMD optimization manual (in assembly) which gives me confidence that these are about the fastest C implementations.

    void itoa03(char *string, unsigned int value)
    {
       *string++ = '0' + ((value = value * 2684355) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = ' ';/* null terminate here if thats what you need */
    }
    

    Similarly, for the line numbers,

    void itoa05(char *string, unsigned int value)
    {
       *string++ = ' ';
       *string++ = '0' + ((value = value * 26844 + 12) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = ' ';/* null terminate here if thats what you need */
    }
    

Overall, my code is pretty fast now. The vsnprintf() I need to use takes about 91% of the time and the rest of my code takes only 9% (whereas the rest of the code i.e. except vsprintf() used to take 54% earlier)

Sikko answered 2/5, 2009 at 5:40 Comment(0)
G
0

StringStream is the suggestion that I got from Google.

http://bytes.com/forum/thread132583.html

Glabrous answered 7/11, 2008 at 12:59 Comment(1)
Why do you think that will be faster?Xyloid
N
0

It's hard to imagine that you're going to beat sprintf at formatting integers. Are you sure sprintf is your problem?

Nikkinikkie answered 7/11, 2008 at 13:5 Comment(2)
It's not formatting integers that sprintf takes its time on, it's parsing the format string. Since it doesn't change in the loop, parsing it each time around is unnecessary.Hultin
I have an implementation for printing POD types by some tricks as meta programming. could be 30 times faster than snprintfMlawsky

© 2022 - 2024 — McMap. All rights reserved.