Why are standard string functions faster than my custom string functions?

Asked 17/7, 2012 at 12:31 Answered 18/7, 2012 at 12:23

I decided to find the speeds of 2 functions :

strcmp - The standard comparison function defined in string.h
xstrcmp- A function that has same parameters and does the same, just that I created it.

Here is my xstrcmp function :

int xstrlen(char *str)
{
    int i;
    for(i=0;;i++)
    {
        if(str[i]=='\0')
            break;
    }
    return i;
}

int xstrcmp(char *str1, char *str2)
{
    int i, k;
    if(xstrlen(str1)!=xstrlen(str2))
        return -1;
    k=xstrlen(str1)-1;
    for(i=0;i<=k;i++)
    {
        if(str1[i]!=str2[i])
            return -1;
    }
    return 0;
}

I didn't want to depend on strlen, since I want everything user-defined.

So, I found the results. strcmp did 364 comparisons per millisecond and my xstrcmp did just 20 comparisons per millisecond (atleast on my computer!)

Can anyone tell why this is so ? What does the xstrcmp function do to make itself so fast ?

Overstep answered 17/7, 2012 at 12:31 Comment(13)

Which libc implementation are you using for your base comparison? – Antisemite 17/7, 2012 at 12:33

"does the same" is not accurate... it returns -1 if the strings have different lengths? Also, yours has 3 strlen() calls which are O(n). strcmp() shouldn't need any strlen() calls. – Lotetgaronne 17/7, 2012 at 12:33

I am using Dev-C++ (which uses gcc I think), if that is what you are asking – Overstep 17/7, 2012 at 12:35

"since I want everything user-defined." For learning it's good, but don't reimplement already existing methods just for the sake of it. – Aruba 17/7, 2012 at 12:35

With many compilers strcmp and strcpy are hand-optimized and produce very tight inline code. – Footing 17/7, 2012 at 12:36

What would the actual strlen function look like ? I did a little digging inside string.h but I couldn't find the actual strcmp function - just the prototype definition – Overstep 17/7, 2012 at 12:37

Someone copied the GNU version into this forum thread: compsci.ca/v3/viewtopic.php?t=24383 (3rd post) – Tartu 17/7, 2012 at 12:42

Wow... Thanks BoBTFish, but is do{...}while(exp) faster than for ? – Overstep 17/7, 2012 at 12:44

@c.adhityaa - here is the highly-optimized gnu-strlen function. From the comment: Instead of the traditional loop which tests each character, we will test a longword at a time. The tricky part is testing if any of the four bytes in the longword in question are zero. – Sundry 17/7, 2012 at 12:47

Agner Fog has written hand optimized assembly versions of the standard string and memory functions, which he claims are faster than the GNU implementations as well: agner.org/optimize/#asmlib – Tartu 17/7, 2012 at 12:51

@c.adhityaa: You can download the glibc and check out as it's done in background. It's free. – Obsequent 17/7, 2012 at 12:57

Thanks, I will check out glibc – Overstep 17/7, 2012 at 12:59

standard library may use an sisd or simd insturction – Allysonalma 22/7, 2012 at 22:57

if(xstrlen(str1)!=xstrlen(str2))    //computing length of str1
    return -1;                      
k=xstrlen(str1)-1;                  //computing length of str1 AGAIN!

You're computing the length of str1 TWICE. That is one reason why your function loses the game.

Also, your implemetation of xstrcmp is very naive compared to the ones defined in (most) Standard libraries. For example, your xstrcmp compares one byte at a time, when in fact it could compare multiple bytes in one go, taking advantage of proper alignment as well, or can do little preprocessing so as to align memory blocks, before actual comparison.

Steak answered 17/7, 2012 at 12:34 Comment(4)

Yeah. Besides that any other way to optimize ? – Overstep 17/7, 2012 at 12:38

@c.adhityaa The most important optimization in data-processing tasks is to find ways how to access the data from memory as fast as possible. Every other trick doesn't help much, usually. – Sundry 17/7, 2012 at 12:41

Even worse than the THREE calls to strlen, is that none are required. – Vinegarette 17/7, 2012 at 20:20

@Nawaz so I was checking strcmp() function of glibc(2.21). Memory block alignment techniques have not been used there. Only loop unrolling is used. Wouldn't it give much better performance with memory alignment? If so, why that was not used? – Traprock 6/5, 2015 at 10:22

strcmp and other library routines are written in assembly, or specialized C code, by experienced engineers and use a variety of techniques.

For example, the assembly implementation might load four bytes at a time into a register, and compare that register (as a 32-bit integer) to four bytes from the other string. On some machines, the assembly implementation might load eight bytes or even more. If the comparison shows the bytes are equal, the implementation moves on to the next four bytes. If the comparison shows the bytes are unequal, the implementation stops.

Even with this simple optimization, there are a number of issues to be dealt with. If the string addresses are not multiples of four bytes, the processor might not have an instruction that will load four bytes (many processors require four-byte loads to use addresses that are aligned to multiples of four bytes). Depending on the processor, the implementation might have to use slower unaligned loads or to write special code for each alignment case that does aligned loads and shifts bytes in registers to align the bytes to be compared.

When the implementation loads four bytes at once, it must ensure it does not load bytes beyond the terminating null character if those bytes might cause a segment fault (error because you tried to load an address that is not readable).

If the four bytes do contain the terminating null character, the implementation must detect it and not continue comparing further bytes, even if the current four are equal in the two strings.

Many of these issues require detailed assembly instructions, and the required control over the exact instructions used is not available in C. The exact techniques used vary from processor model to processor model and vary greatly from architecture to architecture.

Anguish answered 17/7, 2012 at 12:47 Comment(2)

+1, The segmentation fault case could be handled easily by performing first the slow test for unaligned and then working word by word, which ensures that all 4 bytes come from the same cache-line (and memory page). On the other hand, detection of the terminating character and determining equality when the null character is not the last character in an aligned word are much more complex. – Polygynous 17/7, 2012 at 14:4

@DavidRodríguez-dribeas: the trouble with handling alignment comes when you have multiple string arguments that are not necessarily similarly aligned; if a word in one string is aligned, the corresponding word from the other string may not be. There are a number of ways to deal with this, but they can get fussy (more of a pain than handling termination conditions). – Enthalpy 17/7, 2012 at 14:39

Faster implementation of strlen:

//Return difference in addresses - 1 as we don't count null terminator in strlen.
int xstrlen(char *str)
{
    char* ptr = str;
    while (*str++);
    return str - ptr - 1;
}

//Pretty nifty strcmp from here:
//http://vijayinterviewquestions.blogspot.com/2007/07/implement-strcmpstr1-str2-function.html
int mystrcmp(const char *s1, const char *s2)
{
    while (*s1==*s2)
    {
        if(*s1=='\0')
            return(0);
        ++s1;
        ++s2;
    }
    return(*s1-*s2);
}

I'll do the other one later if I have time. You should also note that most of these are done in assembly language or using other optimized means which will be faster than the best stright C implementation you can write.

Hydrosphere answered 17/7, 2012 at 12:48 Comment(0)

Aside from the problems in your code (which have been pointed out already), -- at least in the gcc-C-libs, the str- and mem-functions are faster by a margin in most cases because their memory access patterns are higly optimized.

There were some discussions on the topic on SO already.

Sundry answered 17/7, 2012 at 12:34 Comment(0)

Try this:

int xstrlen(const char* s){
  const char* s0 = s;
  while(*s) s++;
  return(s - s0);
}

int xstrcmp(const char* a, const char* b){
  while(*a && *a==*b){a++; b++;}
  return *a - *b;
}

This could probably be sped up with some loop unrolling.

Fictionalize answered 17/7, 2012 at 12:50 Comment(2)

@Mike, loop unrolling will not speed things up if the code for the optimized loop(s) become(s) large enough to fill the instruction cache. – Gladiator 17/7, 2012 at 16:23

"365 comparisons for strcmp and 62 for your xstrcmp - 3x better. " -- That's because your version is pessimized. Why not throw in ten more strlen calls just for yucks? – Impel 29/8, 2012 at 21:0

1. Algorithm

Your implementation of strcmp could have a better algorithm. There should be no need to call strlen at all, each call to strlen will iterate over the whole length of the string again. You can find simple but effective implementations online, probably the place to start is something like:

// Adapted from http://vijayinterviewquestions.blogspot.co.uk
int xstrcmp(const char *s1, const char *s2)
{
  for (;*s1==*s2;++s1,++s2)
  {
    if(*s1=='\0') return(0);
  }
  return(*s1-*s2);
}

That doesn't do everything, but should be simple and work in most cases.

2. Compiler optimisation

It's a stupid question, but make sure you turned on all the optimisation switches when you compile.

3. More sophisticated optimisations

People writing libraries will often use more advanced techniques, such as loading a 4-byte or 8-byte int at once, and comparing it, and only comparing individual bytes if the whole matches. You'd need to be an expert to know what's appropriate for this case, but you can find people discussing the most efficient implementation on stack overflow (link?)

Some standard library functions for some platforms may be hand-written in assembly if the coder can knows there's a more efficient implementation than the compiler can find. That's increasingly rare now, but may be common on some embedded systems.

4. Linker "cheating" with standard library

With some standard library functions, the linker may be able to make your program call them with less overhead than calling functions in your code because it was designed to know more about the specific internals of the functions (link?) I don't know if that applies in this case, it probably doesn't, but it's the sort of thing you have to think about.

5. OK, ok, I get that, but when SHOULD I implement my own strcmp?

Off the top of my head, the only reasons to do this are:

You want to learn how. This is a good reason.
You are writing for a platform which doesn't have a good enough standard library. This is very unlikely.
The string comparison has been measured to be a significant bottleneck in your code, and you know something specific about your strings that mean you can compare them more efficiently than a naive algorithm. (Eg. all strings are allocated 8-byte aligned, or all strings have an N-byte prefix.) This is very, very unlikely.

6. But...

OK, WHY do you want to avoid relying on strlen? Are you worried about code size? About portability of code or of executables?

If there's a good reason, open another question and there may be a more specific answer. So I'm sorry if I'm missing something obvious, but relying on the standard library is usually much better, unless there's something specific you want to improve on.

Fluttery answered 18/7, 2012 at 12:23 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags