Why VC++ Strings are not reference counted?
Asked Answered
E

5

14

STL standard do not require from std::string to be refcounted. But in fact most of C++ implementations provide refcounted, copy-on-write strings, allowing you passing string by value as a primitive type. Also these implementations (at least g++) use atomic operations making these string lock-free and thread safe.

Easy test shows copy-on-write semantics:

#include <iostream>
#include <string>

using namespace std;

void foo(string s)
{
    cout<<(void*)s.c_str()<<endl;
    string ss=s;
    cout<<(void*)ss.c_str()<<endl;
    char p=ss[0];
    cout<<(void*)ss.c_str()<<endl;
}

int main()
{
    string s="coocko";
    cout<<(void*)s.c_str()<<endl;
    foo(s);
    cout<<(void*)s.c_str()<<endl;
}

Only two adresses are printed exactly after a non-constant member was used.

I tested this code using HP, GCC and Intel compiler and got similar results -- strings work as copy-on-write containers.

On the other hand, VC++ 2005 shows clearly that each string is fully copied.

Why?

I know that there was a bug in VC++6.0 that had non-thread-safe implementation of reference counting that caused random program craches. Is this the reason? They just afraid to use ref-counting any more even it is common practice? They prefer to not use ref-counting at all over fixing the issue?

Thanks

Entrap answered 1/4, 2009 at 19:33 Comment(0)
S
22

I think that more and more std::string implementations will move away from refcounting/copy-on-write as it is often a counter-optimization in multi-threaded code.

See Herb Sutter's article Optimizations That Aren't (In a Multithreaded World).

Spilt answered 1/4, 2009 at 19:40 Comment(8)
I believe that Scott Meyers mentions something similar in one of his books - can't cite (and not even sure) because I don't have them with me at the moment.Spilt
+1, spent 10 minutes looking for that reference. There are so many C++ questions on SO which simply require a pointer to Herb's writing.Methanol
Actually this article clearly shows that COW strings using atomic operations has almost same performance as normal COW strings. (difference of 25%) So it is not too much "overhead", and defiantly much more for doing a real copy.Entrap
Since when did 25% difference become "almost the same"? :)Placard
@Andrew The point is, real coping is much expensive then COW, so they compared Thread Safe COW and Not thread safe COW. And there is no big difference. More than that for micro benchmarks 25% is not too much.Entrap
@Michael, Re:Scott Meyers, see page 65 (Item 13) of Effective STL. He also refers to the same Herb Sutter article.Renfrew
Hi, all, thanks for the info. I would like to ask a question that: currently, the GCC C++ library's implementation use reference counting, I am not sure does it use some atomic operation to protect the reference counting? what about the MinGW. thanks.Agentival
Wondering if the modest gains you get from eliminating atomic copy on write implementation is worth it. Given the increase in memory usage due to extra copies that increases the likely hood of thing getting swapped out which would dwarf any of the gains talked about. All the discussion seems to ignore everything but raw CPU performance.Horn
A
11

The STL actual requires that if you use reference counting that the semantics are the same as for a non reference counted version. This is not trivial for the general case.(Which is why you should not write your on string class).

Because of the following situation:

std::string   x("This is a string");
char&         x5 = x[5];
std::string   y(x);

x5 = '*';

See: http://www.sgi.com/tech/stl/string_discussion.html for more details

Abundance answered 1/4, 2009 at 19:59 Comment(7)
Interesting... the standard actually has a note that specifically says refcounted string are permitted, but that the sematics must be the same as non-refcounted strings. (comment continued in next comment due to size limitations)Spilt
The standard then gives an example similar to yours (but modifying via an iterator rather than a reference) and says the 2nd string must not be modified, but it doesn't give an indication of how the implementation should do that (hey it's an implementation detail, right?).Spilt
-1 You can actually check this code in g++ and see that it works perfectly. In fact g++ has thread safe strings using ref-counting. So it is defiantly doableEntrap
Yes it is doable, but basically any function which returns an lvalue must assume that the string may be modified and do a CoW. (For example x[5] returns a char&). This fact makes reference counted std::string's not a real gain.Outgroup
Artyom: -1 for what? He's talking about what the language allows, not what individual compilers do. Hopefully you're not so naive you trust C++ compilers to be compliant? +1 from here to counter ;)Erastatus
ok Removed -1. And yes x[5] makes string "modified". The point is "hard" does not mean not-doable. Just remember that real copy requires memory allocation that is expensiveEntrap
Modifying the wording so its not so controversial.Abundance
R
7

As stated by Martin & Michael, Copy On Write (COW) is often more trouble than it's worth, for further reading see this excellent article by Kelvin Henney about Mad COW Disease and I believe it was Andrei Alexandrescu that stated that Small String Optimization performs better in many applications (but I can't find the article).

Small String Optimization is where you make the string object bigger and avoid heap allocations for small strings. A toy implementation will look something like this:

class string {
    char *begin_, *end_, *capacity_;
    char buff_[64]; // pick optimal size (or template argument)
public:
    string(const char* str)
    {
        size_t len = strlen(str);
        if (len < sizeof(buff_))
        {
            strcpy(buff_, str);
            begin_ = buff_;
            capacity_ = buff_ + sizeof(buff_);
        }
        else
        {
            begin_ = strdup(str);
            capacity_ = begin_ + len;
        }
        end_ = begin_+len;
    }

    ~string()
    {
        if (begin_ != buff_)
            free(begin_); // strdup requires free 
    }
    // ...
};
Ritualist answered 2/4, 2009 at 19:1 Comment(0)
E
5

Maybe Microsoft determined that string copying was not a big issue, as almost all C++ code uses pass by reference wherever possible. Maintaining a reference count has an overhead in space and time (ignoring locking) that perhaps they decided was not worth paying.

Or maybe not. If this is of concern for you, you should profile your application to determine if string copying is a major overhead, and if it is switch to a different string implementation.

Eyeglass answered 1/4, 2009 at 19:43 Comment(1)
yeah I agree. jsut pass by ref. what's the big deal? wanna save on typing?Cussed
S
1

It is not main reason, but I saw a lot of incorrect code under win32 platform which do something like const_cast< char* >( str.c_str() ).

Maybe Microsoft know this and takes care about developers :)

Schweitzer answered 1/4, 2009 at 19:46 Comment(2)
I hadn't heard this before, but I'll bet you're right that this might be a factor.Spilt
You are supposed to let them shoot if they point right at their foot and specifically pull the trigger, though.Registrar

© 2022 - 2024 — McMap. All rights reserved.