Why the c++'s implemented string::substr() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)?
I assume you mean find()
, rather than substr()
which doesn't need to search and should run in linear time (and only because it has to copy the result into a new string).
The C++ standard doesn't specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string
operations are that size()
, max_size()
, operator[]
, swap()
, c_str()
and data()
are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you're using.
The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.
Is that corrected in c++0x?
No, C++11 doesn't add any complexity requirements to std::string
, and certainly doesn't add any mandatory implementation details.
If the complexity of current substr is not O(N * M), what is that?
That's the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N)
. So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.
string::find
, see my post: #19507071 – Discounter(std::string(n-1,'a')+"b"+std::string(n,'a')).find(std::string(n,'a'))
is O(n^2) or worse. E.g. for n = 1 million, it takes 21 seconds on my computer, but should be instant (a few milliseconds, perhaps). – Incarnate