Which string classes to use in C++?
Asked Answered
C

7

17

we have a multi-threaded desktop application in C++ (MFC). Currently developers use either CString or std::string, probably depending on their mood. So we'd like to choose a single implementation (probably something other than those two).

MFC's CString is based on copy-on-write (COW) idiom, and some people would claim this is unacceptable in a multithreaded environment (and probably reference to this article). I am not convinced by such claims, as atomic counters seem to be quite fast, and also this overhead is somehow compensated by a reduction in memory re-allocations.

I learned that std::string implementation depends on compiler - it is not COW in MSVC but it is, or was in gcc. As far as I understood, the new C++0x standard is going to fix this by requiring a non-COW implementation and resolve some other issues, such as contiguous buffer requirements. So actually std::string looks not well defined at this point...

A quick example of what I don't like about std::string: no way to return a string from a function without excessive re-allocations (copy constructor if return by value, and no access to internal buffer to optimize that so "return by reference" e.g. std::string& Result doesn't help). I can do this with CString by either returning by value (no copy due to COW) or passing by reference and accessing the buffer directly. Again, C++0x to the rescue with its rvalue references, but we are not going to have C++0x in the nearest feature.

Which string class should we use? Can COW really become an issue? Are there other commonly used efficient implementations of strings? Thanks.

EDIT: We don't use unicode at the moment, and it is unlikely that we will need it. However, if there is something easily supporting unicode (not at the cost of ICU...), that would be a plus.

Chock answered 17/1, 2011 at 14:55 Comment(11)
Most people will assume it is non UNICODE. Is it right?Wuhsien
@baris_a: thanks, I'll make it more clearChock
Are you aware of RVO? That might allay some of your concerns about strings as return values. You might also want to read Want Speed? Pass By Value.Nautical
@Fred Larson: RVO won't always work, as indicated at the link that you provided (conditional return value). Compiler-optimized passing by value sounds interesting, but does not yet seem to be generally accepted, so most people won't do that. Finally, returning strings from functions is only a single specific example.Chock
If you want future proof, choose std::string. In addition it's fast, bullet-proof, and extremely well supported.Gam
C++0x does not disallow COW implementations of std::basic_string. It disallows non-contiguous buffer implementations, such as SGI's "ropes".Kerchief
@Billy: Are you sure? I've seen people claiming it does disallow COW, like here: #4496650Chock
@7vies: As far as I am aware. Of course I would love to be proven wrong with a standard reference...Kerchief
@Billy: there is a reference to N2668 at the link I've cited, but I did not go deep enough to see if it is actually relevant or notChock
@Billy: Also, if the standard does not clearly say if it should be COW or not, it's quite confusing, as I would expect that to be specified (it would be even better if I could choose between a COW and a "normal" implementation myself)Chock
@7vies: in C++03 the standards committee made sure to go out of their way to ensure COW was a valid implementation of std::string. However, it is not required.Kerchief
D
16

I would use std::string.

  • Promote decoupling from MFC
  • Better interaction with existing C++ libraries

The "return by value" issue is mostly a non-issue. Compilers are very good at performing Return Value Optimization (RVO) which actually eliminates the copy in most cases when returning by value. If it doesn't, you can usually tweak the function.

COW has been rejected for a reason: it doesn't scale (well) and the so-hoped-for increase in speed has not been really measured (see Herb Sutter's article). Atomic operations are not as cheap as they appear. With mono-processor mono-core it was easy, but now multi-core are commodity and multi-processors are widely available (for servers). In such distributed architectures there are multiple caches, that need be synchronized, and the more distributed the architecture, the more costly the atomic operations.

Does CString implement Small String Optimization ? It's a simple trick that allows a string not to allocate any memory for small strings (usually a few characters). Very useful because it turns out that most strings are in fact small, how many strings in your application are less than 8-characters long ?

So, unless you present me a real benchmark which clearly shows a net gain in using CString, I'd prefer sticking with the standard: it's standard, and likely better optimized.

Dentil answered 17/1, 2011 at 15:25 Comment(12)
You re-cited the same article, nice :) It looks quite outdated, though. I'm talking about MFC and desktop systems, where atomic counters are actually quite fast (this might change in future but who knows). SSO is a good thing, but it is not against COW, right? I'm not arguing that CString is better than std::string, I'm just looking for an answer to my question.Chock
"Better interaction with existing C++ libraries" - if the existing C++ library is MFC, CString provides better interaction.Petrology
@7vies: SSO is perfectly compatible with COW, but I have never seen the intrinsic of CString so I don't know whether or not it uses it. I have tried to answer your question: I've addressed your performance concerns and I've tried to make it clear that given the choice between a standard and some custom version I'd go for the standard unless there's a performance issue.Dentil
@Mark Ransom: but would one only use MFC ? I know I use perhaps 7 or 8 free libraries, on top of Boost, and none of them know what MFC is. And I am in a large company, with a lot of low-level things being re-crafted here, so I'd guess that if the company was smaller I'd use even more libraries.Dentil
@Matthieu M.: Well, std::string is a standard, but I don't see why it is supposed to be the standard (thus my question). The point about STL is also to be compatible with limited devices (embedded etc), while I'm talking about windows desktop apps where something else could be more suitable.Chock
@7vies: well it is the standard in the sense that it is part of the standard library, and thus available to all developers. Now I admit that your specific environment is unknown to me, but I would still be wary of restricting myself to MFC specifc libraries. An example among others: TinyCpp (cpp adaptation for TinyXml) use std::string.Dentil
@Matthieu M.: I wouldn't like to be restricted to MFC neither, but there could be some other commonly used string library which I am not aware of, more suitable for my situation than std::string. Apparently, this is not the case.Chock
@Matthieu M.: One additional remark, RVO won't be enough to eliminate all copying, only very simple ones like return std::string("constant string"). If you create a named variable and modify it in your function, you'll need NRVO support for compiler to optimize this (it seems MSVC has NRVO only starting from 2005)Chock
@7vies: like all optimizations, it only works if the compiler support them :) It's pretty standard though, but I would be hard pressed to guess when it was introduced in MSVC (I have only used it from 2008 onward). I do recall though that in the 2008 NRVO was not performed in debug builds.Dentil
@Matthieu M.: Lucky you, I had to use MSVC 6 and now it is 2003 (well it's not my choice...). So no NRVO for me :(Chock
@7vies: then COW might present a real advantage. I'm glad I'm not working with such an antiquated compiler :)Dentil
@Matthieu M.: Yeah.. But we could probably still use std::string, by applying this std::swap trick that @DeadMG mentioned and implement "fast copy" that just passes ownership (like auto_ptr). This would be equivalent to NRVO then, and not COW. [Yes, I'm tired of doing compiler's work, but I have no choice for the moment] Another point is as @Billy ONeal pointed out it is not that forbidden to write to std::string's internal buffer, and then passing by reference helps.Chock
W
5

Actually, the answer may be "It depends". But, if you are using MFC, IMHO, CString usage would be better. Also, you can use CString with STL containers also. But, it will lead to another question, should I use stl containers or MFC containers with CString? Usage of CString will provide agility to your application for example in unicode conversions.

EDIT: Moreover, if you use WIN32 api calls, CString conversions will be easier.

EDIT: CString has a GetBuffer() and regarding methods that allow you to modify buffer directly.

EDIT: I have used CString in our SQLite wrapper, and formatting CString is easier.

    bool RS::getString(int idx, CString& a_value) {

//bla bla

        if(getDB()->getEncoding() == IDatabase::UTF8){
            a_value.Format(_T("%s"), sqlite3_column_text(getCommand()->getStatement(), idx));
        }else{
            a_value.Format(_T("%s"), sqlite3_column_text16(getCommand()->getStatement(), idx));
        }
        return true;
}
Wuhsien answered 17/1, 2011 at 15:17 Comment(9)
No way we are going to use MFC containers :) we already use STL for thatChock
CString::GetBuffer == &MyStdString[0] . Not much of an argument there. As for formatting, that's what std::stringstream is for.Kerchief
@Billy: How can you mention SGI's ropes and &MyStdString[0] at the same time? (it's not C++0x yet for everyone, right?) I also disagree about std::stringstream, as it is an awfully looking way to format strings, especially when it comes to modifiers. Boost::format looks way betterChock
@7vies: Because nobody implements std::string in terms of ropes. And I agree boost::format can be better, but not all of us can use boost.Kerchief
@Billy: Neat reference, thanks. In the absence of boost:format many people won't stick to stringstream but would continue to use C-style printf etc, which I find reasonable having seen how inconvenient stringstream isChock
@7vies: Personally I don't find stringstream inconvenient. But the whole thing is really a matter of opinion -- it really doesn't matter.Kerchief
@Billy: Well, if it didn't matter boost::format wouldn't exist, and nobody would even use C++, as there is C already :) I think that usability does matter.Chock
@7vies: What I mean is that our difference of opinion doesn't matter. I'm perfectly happy with stringstream. You aren't. But neither of our opinions matter here because the question is about strings, not formatting.Kerchief
@Billy: Not agree again :) Formatting is an essential part of working with strings, probably even the most important one, and considering the string question without thinking about formatting would be strangeChock
C
1

I don't know of any other common string implementations- they all suffer from the same language limitations in C++03. Either they offer something specific, like how the ICU components are great for Unicode, they're really old like CString is, or std::string trumps them.

However, you can use the same technique that the MSVC9 SP1 STL uses- that is, "swaptimization", which is the most hilariously named optimization ever.

void func(std::string& ref) {
    std::string retval;
    // ...
    std::swap(ref, retval); // No copying done here.
}

If you rolled a custom string class that didn't allocate anything in it's default constructor (or checked your STL implementation), then swaptimizing it would guarantee no redundant allocations. For example, my MSVC STL uses SSO and doesn't allocate any heap memory by default, so by swaptimizing the above, I get no redundant allocations.

You could improve performance substantially too by just not using expensive heap allocation. There are allocators designed for temporary allocations, and you can replace the allocator used in your favourite STL implementation with a custom one. You can get things like object pools from Boost or roll a memory arena. You can get tenfold better performance compared to a normal new allocation.

Congress answered 17/1, 2011 at 15:19 Comment(0)
A
1

I would suggest making a "per DLL" decision. If you have DLLs depending heavily on MFC (for example, your GUI layer), where you need a lot of MFC calls with CString parameters, use CString. If you have DLLs where the only thing from MFC you are going to use would be the CString class, use std::string instead. Of course, you will need conversion function between both classes, but I suspect you have already solved that issue.

Anatolio answered 17/1, 2011 at 17:3 Comment(0)
A
1

I say always go for std::string. As mentioned, RVO and NVRO will make returning by copies cheap, and when you do end up switching to C++0x eventually, you get a nice performance boost from move semantics, without doing anything. If you want to take any code and use it in a non-ATL/MFC project, you can't use CString, but std::string will be there, so you'll have a much easier time. Finally, you mentioned in a comment you use STL containers instead of MFC containers (good move). Why not stay consistent and use STL string too?

Argument answered 17/1, 2011 at 18:12 Comment(1)
MFC containers are just unusable, so that was straightforward :) CString, on the other hand, is very comfortable to work with, as it was mentioned in other answers - for example, CString::Format is very useful (I know there are ways to do a similar thing with std::string, but they are far not that convenient).Chock
A
0

I would advise using std::basic_string as your general string template base unless there is a good reason to do otherwise. I say basic_string because if you are handling 16-bit characters you would use wstring.

If you are going to use TCHAR you should probably define tstring as basic_string and may wish to implement a traits class for it too to use functions like _tcslen etc.

Alejoa answered 17/1, 2011 at 15:17 Comment(0)
P
-2

std::string is usually reference counted, so pass-by-value is still a cheap operation (and even more so with the rvalue reference stuff in C++0x). The COW is triggered only for strings that have multiple references pointing to them, i.e.:

std::string foo("foo");
std::string bar(foo);
foo[0] = 'm';

will go through the COW path. As the COW happens inside operator[], you can force a string to use a private buffer by using its (non-const) operator[]() or begin() methods.

Proponent answered 17/1, 2011 at 15:23 Comment(7)
As far as I know, it is not reference counted in MSVC, please correct me if I'm wrong. We are also stuck with an old MSVC version.Chock
Actually the C++0x standard forbids the use of COW.Dentil
@Matthieu: Do you have a standard reference for that? (Not saying you're wrong, just can't find it)Kerchief
@Billy: I've looked hard but could not find anything in n3225 either. The only thing I could find was for valarray in [valarray.cons]: remark 289, Implementations in which arrays share storage are permitted, but they shall implement a copy-on-reference mechanism to ensure that arrays are conceptually distinct I was pretty certain COW was forbidden for strings but it may stem from other discussions...Dentil
@MatthieuM.: The C++0x standard forbids non-contiguous buffers, and makes COW extremely difficult to do in a performant manner, but I don't think it's expressly forbidden.Penury
https://mcmap.net/q/17472/-legality-of-cow-std-string-implementation-in-c-11/845092 clarifies "invalidation of iterators/references is only allowed for [specific functions]" which make it illegal to do COW since non-const [] is not on the list. However, GCC has deliberately ignored this to preserve their ABI. https://mcmap.net/q/420639/-is-std-string-ref-counted-in-gcc-4-x-c-11/845092Penury
@MooingDuck: Thanks for the reference; actually the invalidation of iterators could be worked around by using fat iterators (pointer to string class + index), at the cost of efficiency. As for gcc I think they are still waiting for a "breaking change" version of libstdc++ to introduce various clean ups such as std::string.Dentil

© 2022 - 2024 — McMap. All rights reserved.