Does StringBuilder become immutable after a call to ToString?
Asked Answered
T

5

15

I distinctly remember from the early days of .NET that calling ToString on a StringBuilder used to provide the new string object (to be returned) with the internal char buffer used by StringBuilder. This way if you constructed a huge string using StringBuilder, calling ToString didn't have to copy it.

In doing that, the StringBuilder had to prevent any additional changes to the buffer, because it was now used by an immutable string. As a result the StringBuilder would switch to a "copy-on-change" made where any attempted change would first create a new buffer, copy the content of the old buffer to it and only then change it.

I think the assumption was that StringBuilder would be used to construct a string, then converted to a regular string and discarded. Seems like a reasonable assumption to me.

Now here is the thing. I can't find any mention of this in the documentation. But I'm not sure it was ever documented.

So I looked at the implementation of ToString using Reflector (.NET 4.0), and it seems to me that it actually copies the string, rather than just share the buffer:

[SecuritySafeCritical]
public override unsafe string ToString()
{
    string str = string.FastAllocateString(this.Length);
    StringBuilder chunkPrevious = this;
    fixed (char* str2 = ((char*) str))
    {
        char* chPtr = str2;
        do
        {
            if (chunkPrevious.m_ChunkLength > 0)
            {
                char[] chunkChars = chunkPrevious.m_ChunkChars;
                int chunkOffset = chunkPrevious.m_ChunkOffset;
                int chunkLength = chunkPrevious.m_ChunkLength;
                if ((((ulong) (chunkLength + chunkOffset)) > str.Length) ||     (chunkLength > chunkChars.Length))
                {
                    throw new ArgumentOutOfRangeException("chunkLength",     Environment.GetResourceString("ArgumentOutOfRange_Index"));
                }
                fixed (char* chRef = chunkChars)
                {
                    string.wstrcpy(chPtr + chunkOffset, chRef, chunkLength);
                }
            }
            chunkPrevious = chunkPrevious.m_ChunkPrevious;
        }
        while (chunkPrevious != null);
    }
    return str;
}

Now, as I mentioned before I distinctly remember reading that this was the case in the early days if .NET. I even found a mention of in this book.

My question is, was this behavior dropped? If so, anyone knows why? It made perfect sense to me...

Tema answered 12/11, 2010 at 15:30 Comment(1)
Interesting. The string is stored as a series of char[]s. But doesn't the line "chunkPrevious = chunkPrevious.m_ChunkPrevious;" imply those arrays are stored in separate instances of StringBuilder, related as a linked-list, internally in the instance of StringBuilder we have reference to?Dibranchiate
S
5

Yup, this has been completely redesigned for .NET 4.0. It now uses a rope, a linked list of string builders to store the growing internal buffer. This is a workaround for a problem when you can't guess the initial Capacity well and the amount of text is large. That creates a lot of copies of the dis-used internal buffer, clogging up the Large Object Heap. This comment from the source code as available from the Reference Source is relevant:

    // We want to keep chunk arrays out of large object heap (< 85K bytes ~ 40K chars) to be sure.
    // Making the maximum chunk size big means less allocation code called, but also more waste 
    // in unused characters and slower inserts / replaces (since you do need to slide characters over
    // within a buffer).
    internal const int MaxChunkSize = 8000;
Sibbie answered 12/11, 2010 at 16:33 Comment(0)
A
5

Yes, you remember correctly. The StringBuilder.ToString method used to return the internal buffer as the string, and flag it as used so that additional changes to the StringBuilder had to allocate a new buffer.

As this is an implementation detail, it's not mentioned in the documentation. This is why they can change the underlying implementation without breaking anything in the defined behaviour of the class.

As you see from the code posted, there is not a single internal buffer any more, instead the characters are stored in chunks, and the ToString method puts the chunks together into a string.

The reason for this change in implementation is likely that they have gathered information about how the StringBuilder class is actually used, and come to the conclusion that this approach gives a better performance weighed between average and worst case situations.

Agamemnon answered 12/11, 2010 at 15:42 Comment(3)
StringBuilder switched to returning a new string in its ToString() method long before it started using ropes, when Microsoft realized that any object that has ever been exposed to the outside world for non-thread-guarded write access while it was mutable must forevermore be presumed to be mutable (since there's no way to know whether some thread might be in the process of writing the object but have gotten momentarily delayed by virtue of being swapped to disk, suspended, preempted by higher priority threads, or whatever).Tilly
@supercat: How long? IIRC the 2.0 implementation returned the internal buffer. As 3.0 and 3.5 were still using the 2.0 code, 4.0 is the next version.Agamemnon
Really? I remember reading about the change aeons ago, before 4.0 was on the horizon. I thought the change happened with 2.0; the philosophy was that it's perfectly acceptable for non-thread-safe use of StringBuilder to make it return a string full of arbitrary garbage characters; it's not fine for it to return a string that might mutate after it's been examined since that behavior could break lots of code that expects string to be immutable (think about the effects of calling String.Intern on a string that later mutates!).Tilly
S
5

Yup, this has been completely redesigned for .NET 4.0. It now uses a rope, a linked list of string builders to store the growing internal buffer. This is a workaround for a problem when you can't guess the initial Capacity well and the amount of text is large. That creates a lot of copies of the dis-used internal buffer, clogging up the Large Object Heap. This comment from the source code as available from the Reference Source is relevant:

    // We want to keep chunk arrays out of large object heap (< 85K bytes ~ 40K chars) to be sure.
    // Making the maximum chunk size big means less allocation code called, but also more waste 
    // in unused characters and slower inserts / replaces (since you do need to slide characters over
    // within a buffer).
    internal const int MaxChunkSize = 8000;
Sibbie answered 12/11, 2010 at 16:33 Comment(0)
G
2

Here is the .NET 1.1 implementation of StringBuilder.ToString from Reflector:

public override string ToString()
{
    string stringValue = this.m_StringValue;
    int currentThread = this.m_currentThread;
    if ((currentThread != 0) && (currentThread != InternalGetCurrentThread()))
    {
        return string.InternalCopy(stringValue);
    }
    if ((2 * stringValue.Length) < stringValue.ArrayLength)
    {
        return string.InternalCopy(stringValue);
    }
    stringValue.ClearPostNullChar();
    this.m_currentThread = 0;
    return stringValue;
}

As far as I can see it will in some cases return the string without copying it. However, I don't think the StringBuilder becomes immutable. Instead I think it will use copy-on-write if you continue to write to the StringBuilder.

Gratitude answered 12/11, 2010 at 15:46 Comment(0)
D
0

That was most likely just an implementation detail, rather than a documented constraint on the interface provided by StringBuilder.ToString. The fact that you feel unsure if it was ever documented might suggest this is the case.

Books will often detail implementations to show some insight into how to use something, but most carry a warning that the implementation is subject to change.

A good example of why one should never rely on implementation details.

I suspect that it wasn't a feature to have the builder become immutable, but merely a side-effect of the implementation of ToString.

Deathly answered 12/11, 2010 at 15:39 Comment(1)
Thanks, Jeff. I understand that it was an implementation detail and I'm not relying on it in any way. What I'm curious about is why the implementation changed since it seems to make perfect sense still.Tema
A
0

I hadn't seen this before, so here's my guess: the internal storage of a StringBuilder appears to no longer be a simple string, but a set of 'chunks'. ToString can't return a reference to this internal string because it no longer exists.

(Are version 4.0 StringBuilders now ropes?)

Adroit answered 12/11, 2010 at 15:40 Comment(1)
It looks rather like a chain of chunks than a tree of chunks.Agamemnon

© 2022 - 2024 — McMap. All rights reserved.