Performance std::strstr vs. std::string::find [duplicate]
Asked Answered
C

1

7

Possible Duplicate:
C++ string::find complexity

Recently I noticed that the function std::string::find is an order of magnitude slower than the function std::strstr - in my environment with GCC 4.7 on Linux. The performance difference depends on the lengths of the strings and on the hardware architecture.

There seems to be a simple reason for the difference: std::string::find basically calls std::memcmp in a loop - with time complexity O(m * n). In contrast, std::strstr is highly optimized for the hardware architecture (e.g. with SSE instructions) and uses a more sophisticated string matching algorithm (apparently Knuth-Morris-Pratt).

I was also surprised not to find the time complexities of these two functions in the language documents (i.e. drafts N3290 and N1570). I only found time complexities for char_traits. But that doesn't help, because there is no function for substring search in char_traits.

I would expect, that std::strstr and memmem contain similar optimizations with almost identical performance. And until recently, I assumed that std::string::find uses memmem internally.

The questions are: Is there any good reason, why std::string::find does not use std::memmem? And does it differ in other implementations?

The question is not: What is the best implementation of this function? It is really difficult to argue for C++, if it is slower than C. I wouldn't matter if both implementations would be slow. It is the performance difference that really hurts.

Commendation answered 11/4, 2012 at 6:33 Comment(2)
@FrerichRaabe: Your are right, there is some overlap in the two questions. But my questions are more specific, and the other article answers none of them.Commendation
@nosid: yes it does. Look in particular to the extra explanation in comments by dietmar kuhl about average-case vs worst-case and space-complexity why this is most likely not used. Those arguments don't change if you reuse std::memmem iso implementing the algorithm from scratch.Upspring
F
2

First, what's memmem? I can't find this in the C++ standard, nor the Posix standard (which contains all of the standard C functions).

Second, any measurement values will depend on the actual data. Using KMP, for example, will be a pessimisation in a lot of cases; probably most of the cases where the member functions of std::string are used; the time to set up the necessary tables will often be more than the total time of the straightforeward algorithm. Things like O(m*n) don't mean much when the typical length of the string is short.

Finbur answered 11/4, 2012 at 8:9 Comment(3)
I asssumed, that memmem is part of C, but apparently it isn't. memmem is to strstr what memcmp is to strcmp. However, I am sure you know that. Nevertheless, as I have already mentioned a few times. The question is not if KMP is a good choice. The question is, why they are using completely different algorithms for strstr and std::string::find.Commendation
@Commendation Perhaps because the expected use pattern is different? Or because different authors have privileged different use patterns? In most applications I've seen, most strings are fairly short, with the longest strings corresponding to perhaps a line. For such strings, using something like KMP would probably be a pessimization. If the authors of memmem thought that the typical use case would involve blocks of several KB of memory or more, on the other hand, it's definitely worthwhile.Finbur
According to my benchmarks, as of 25.06.2013: for GCC, string::find is slightly faster (~10%) (x86_64, -march=native, ran on AWS) - for MSVC 2, times slower (x86, SSE2, on AMD desktop). (full optimizations)Attaway

© 2022 - 2024 — McMap. All rights reserved.