Why doesn’t string.Builder Reset() preserve the underlying buffer?
Asked Answered
M

3

7
// Reset resets the Builder to be empty.
func (b *Builder) Reset() {
    b.addr = nil
    b.buf = nil
}

The code snippet is from the source code in go strings.Builder. The buffer is set to nil instead of b.buf[:0]. What would be a reason to set it to nil instead of preserving the capacity?

EDIT: I can see that Reset() can be used to GC the underlying buffer and allow the Builder struct to be re-used, but it seems like a marginal cost to initialize the struct since it is just two pointers, whereas the underlying array might have been much bigger, and could've been re-used. I feel like there should have been a Clear() function that kept the underlying buffer's capacity but reduced its length to 0, and it would have been trivial to implement. This leads me to believe there is a reason as to why that was not done, and I am curious as to what that reason is.

Methodius answered 30/4, 2020 at 2:48 Comment(5)
What's your line of thinking about why it should preserve the cap? It seems to me that's not really "resetting to be empty"?Liaoning
I guess I’m confused as to the point and/or benefit of using Reset vs just declaring another Builder object.Methodius
Ah right; I don't think there is a difference; it's just a more convenient way to reset it.Liaoning
@Methodius the same string builder instance can be shared among multiple different actors: so resetting it helps to keep the same instance used but to simply clear it.Gynaeco
@Gynaeco i see that it can be used that way, but a string.Builder is just a pointer and a buffer which is essentially just another pointer. It seems like a marginal cost to reset it vs just creating another. If the underlying buffer was preserved, then each actor can use Builder without worrying about additional allocations being done.Methodius
D
6

One of the optimisation of strings.Builder is that it doesn't copy bytes when converting []byte to string. Take a look at it's String() method:

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))
}

It means that reusing the buffer would destroy previously created strings.

And here is the proof on the playground: https://play.golang.org/p/gkSXRwi0-Ff

Dambro answered 1/5, 2020 at 14:42 Comment(2)
That is right. And Rob Pike gave the same explanation here: github.com/golang/go/issues/24716#issuecomment-379098966Pursuit
But bytes.Buffer's Bytes() also provide an alias to the underlying storage and it simply defines that the resulting slice is valid until next buffer modification. Why is this not the same for strings.Builder?Afterbody
F
0

Point of Reset() is to get Builder to initial empty state (as when new one is created).

Benefit of doing so instead of getting a new Builder is when other components of your program hold reference to an existing Builder and you want to "reset" it to initial state without refreshing all those components with new reference.

Forethought answered 30/4, 2020 at 4:25 Comment(5)
I think my main holdup is the lack of a function to "clear" the buffer length vs re-initializing it. This is my opinion, but the reset functionality seems of marginal use (so far) compared to something that would allow the underlying buffer to be re-used. I feel like there is a reason why that functionality was not exposed. Was it safety or something else? The shared instance explanation seems like a minor convenience factor vs the cost of needing to re-allocating the buffer, especially if it was used for a large string.Methodius
That changes question a bit into why there's no Clear() (or Truncate()). strings.Builder was added in 1.10 to make internal slice immutable (and prevent escape) and that's why it can only Grow() or Reset(). Quoting release notes: "API is a restricted subset of bytes.Buffer's that allows it to safely avoid making a duplicate copy of the data during the String method."Forethought
ah interesting, I guess you can re-implement a basic version of Builder with bytes.Buffer. Preventing escape isn't something I considered, but I'm wondering why that is the case. The Builder, once it goes out of scope, would allow the underlying buffer to GC as well wouldn't? What would still have a reference to the underlying buffer to allow it to escape?Methodius
E.g. when you do (*Buffer).Bytes() you leak the underlying slice. I'll update my answer with forged example.Forethought
I just realized that StringBuilder has a function prevent escape analysis from occurring. not sure how it works though. func noescape(p unsafe.Pointer) unsafe.PointerMethodius
N
0

If Reset preserves the underlying buffer, then a long-living Builder will occupy memory for the longest string it ever built. The array allocated for the longest string will always be alive even though most of it stays unused. Setting the buffer to nil allows garbage collector to collect such potential large buffers.

Nichollenicholls answered 30/4, 2020 at 4:26 Comment(5)
This holds true for just keeping the Builder as is without calling Reset() though. It will persist the Builder until it goes out of scope. You can't really re-use it even after it has allocated more space. I can see the utility of Reset() to return the space allocated to be GC-ed, but the lack of a Clear()-type function precludes the ability to re-use the buffer. It seems like it forces the Builder to only be used to generate a single string when it could be used for multiple.Methodius
If it truncated the slice when you called Reset with a long-lived buffer, then it'd be occupying a much larger space than needed just because you happened to build a large string once.Nichollenicholls
yes that is true, but it seems strange that the option to truncate isn't even there to begin with. The caller should be able to determine if that space is useful or notMethodius
Years ago I was involved in a similar discussion but with Java, whether reusing the same buffer or creating a new buffer every time for building strings is better. Measured under realistic loads creating a new buffer performed better at the time.Nichollenicholls
how is that the case? wouldn't you be killed on the re-allocation costs, or is that something the runtime is able to optimize? e.g. not actually GC it.Methodius

© 2022 - 2024 — McMap. All rights reserved.