Java StringBuilder(StringBuffer)'s ensureCapacity(): Why is it doubled and incremented by 2?
Asked Answered
B

5

27

I have searched about this, but I couldn't find why StringBuilder's ensureCapacity() method won't lengthen the old capacity by just doubling but also adding two.

So, when default capacity of 16 is full, next lengthened value will be 34 unless whole string length doesn't exceed 34. Why shouldn't it be 32?

My best guess is considering for a null character, '\u0000', but I'm not sure. Can anyone tell me why?

Bangweulu answered 14/7, 2017 at 4:17 Comment(9)
@soorapadman int newCapacity = (value.length + 1) * 2; Won't it be sufficient with just (value.length) * 2; ? That's what I'm curious about.Bangweulu
i understood . I am expecting someone explain better rather than i am giving it answer . This is the reason i kept quit . :)Alar
Thank you @Alar for helping me out! :D I appreciate it.Bangweulu
where did you find (value.length + 1) * 2. I mean is it used as it is in somewhere ?Histrionism
@Histrionism You can check here, though it is the open-jdk version.Bangweulu
@PhilipPaek it's bit different now I think. searchcode.com/codesearch/view/17990403 here it is int newCapacity = value.length * 2 + 2;Histrionism
what does the source code say? that will tell you what you need to know.Krumm
Though the documentation states; Twice the old capacity, plus 2, the code int newCapacity = (value.length + 1) * 2 actually meant; Add an additional space, and double it down. So that makes sense, since the capacity is filled up, just add ONE additional space, and then double the new length.Appendage
For sure the null character '\u0000` has nothing to do with it, it has no special meaning in Java. Java does handle the null character exactly the same way as any other character. Java handles the length of a String not by the means of the null character.Irita
T
8

I believe it has to do with a simple, if somewhat dumb, way to ensure the corner case of very small strings.

For example, if I have the string

""

and I double it only, I will not have a sufficient size to store anything else in it. If I double it and add a small constant number of spaces, I can assure that my new value is larger than my old one.

Why increment it by two then? Probably a small performance improvement. By adding two instead of 1, I can avoid an intermediate expansion for small expansions (0 to 10 chars detailed below)

"" => expand => "1" => expand => "123" expand => "1234567" expand => "123456789012345"

which is 4 expands compared to

"" => expand => "12" => expand => "123456" => expand => "123456789012"

which is 3 expands. This also works nicely for one char strings (expanding to 10 chars)

"1" => expand => "1234" => expand => "1234567890"

while the 1 char expansion routine looks like

"1" => expand => "123" => expand => "1234567" => expand => "123456789012345"

Finally, an added increment of two tends to word align about 50% of the time, while added increments of one or three would do so about 25% of the time. While this might not seem like a big deal, some architectures cannot accommodate non-aligned reads without expensive interrupt calls to rewrite the read in the CPU, leading to all sorts of performance issues.

Tureen answered 17/7, 2017 at 20:18 Comment(7)
Why not add space only when length = 0?Tedford
They include an if-else that handles the 0 case even if they didn't add 2.Queenie
@Queenie Well, that's bizarre, but it wouldn't be the first time I've seen multiple ways of handling the same "worrisome" case.Tureen
That's the same as adding one to the length. I'm not talking about how much to add, just when to add.Tedford
@Tedford The comment in the code block indicates that reading the correct amount to add is problematic. "This implements the expansion semantics of ensureCapacity with no size check or synchronization" I imagine that reading the size of something not yet stored might be problematic, or that locking the item to be read might create an issue. In other words, they expand the capacity, without attempting to know the right capacity due to "design reasons". All expansions will be "a guess" and if the guess is too small, they'll try again.Tureen
@Tedford The really bizarre thing about this guessing technique, is that they are given the minimumCapacity to expand to. I'm guessing that they added the minimumCapacity variable after they wrote the expandCapacity(...) method as an attempt to not stack trash as much, but the original design was "expand to what's needed, when needed" and the odd (double + 2) semantics made more sense when it was the only line in the function.Tureen
In mono they use a similar technique for StringBuilder, but they only doubles the capacity on the first try.Queenie
L
0

I think the reason is a combination of

  • some ancient ;-) heuristic strategy how to extend the capacity, especially for short buffers,

  • documenting this strategy in earliest java API docs,

  • Sun/Oracle being very careful to stick to once-documented behaviour.

StringBuilder shares this method with its predecessor StringBuffer, which reads (probably from the earliest beginnings, at least in j2sdk1.4_02, which happened to still exist in some archive folder on my machine):

/**
 * Ensures that the capacity of the buffer is at least equal to the
 * specified minimum.
 * If the current capacity of this string buffer is less than the 
 * argument, then a new internal buffer is allocated with greater 
 * capacity. The new capacity is the larger of: 
 * <ul>
 * <li>The <code>minimumCapacity</code> argument. 
 * <li>Twice the old capacity, plus <code>2</code>. 
 * </ul>
 * If the <code>minimumCapacity</code> argument is nonpositive, this
 * method takes no action and simply returns.
 *
 * @param   minimumCapacity   the minimum desired capacity.
 */
public synchronized void ensureCapacity(int minimumCapacity) {
    if (minimumCapacity > value.length) {
        expandCapacity(minimumCapacity);
    }
}

And it documents exactly the times-two plus-two behaviour, so even if in the meantime some JRE developer had found a better strategy, there's no chance to implement it here because it wouldn't conform to the documentation.

Lasting answered 23/7, 2017 at 15:32 Comment(2)
Sun/Oracle being very careful to stick to once-documented behaviour. If it can't be observed, I wouldn't call it documented behavior. It's just an implementation note.Tedford
It can be observed through the capacity() method.Gothicize
D
0

I suppose that object alignment is a key, because length * 2 + 2-strategy is memory-effective (see explanation below).

Let's consider HotSpot JVM.

First of all, java objects are 8-bytes aligned and char array is not an exception.

Secondly, sizeof(object header) equals 8 bytes on 32-bit JVM and 16 bytes on 64-bit JVM with -XX:-UseCompressedOops.

Thus, object body should be aligned by 8 bytes:
objectBodySize(charArray) == sizeOf(arrayLength) + sizeOf(arrayValues) == (4 bytes) + (arrayLength * 2 bytes).

If old array length is even, then new array length will always give zero-size alignment.

Examples:

  1. oldCharArrayLength == 6 then newCharArrayLength == 14 and objectBodySize(newCharArray) == 4 + 14 * 2 == 32

  2. oldCharArrayLength == 4 then newCharArrayLength == 10 and objectBodySize(newCharArray) == 4 + 10 * 2 == 24

It's important to note that -XX:+UseCompressedOops flag is available since 1.6 whereas StringBuilder and AbstractStringBuilder are available since 1.5. It means that the strategy above with two additional chars has zero-cost of memory on 64-bit JVM before 1.6, whereas sizeof(object header) == 12 bytes when run on 64-bit JVM with -XX:+UseCompressedOops.

Dextrality answered 24/7, 2017 at 19:17 Comment(0)
I
0

I downloaded the original source code of Java 1.5 from the Oracle web and it contains the following lines:

/**
 * This implements the expansion semantics of ensureCapacity with no
 * size check or synchronization.
 */
void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2;
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }   
    char newValue[] = new char[newCapacity];
    System.arraycopy(value, 0, newValue, 0, count);
    value = newValue;
} 

So at least two things are clear:

  • the theory that other corrections were additionally added is false (e.g. "the odd (double + 2) semantics made more sense when it was the only line in the function" is not true)
  • most probably it was originally meant as "let's make room for at least one more character and let's multiply it by two"
Irita answered 25/7, 2017 at 9:26 Comment(0)
O
0
 public void ensureCapacity(int minimumCapacity) {
     if (minimumCapacity > value.length) {
         expandCapacity(minimumCapacity);
     }
 }



 void expandCapacity(int minimumCapacity) {
     int newCapacity = (value.length + 1) * 2;
     if (newCapacity < 0) {
         newCapacity = Integer.MAX_VALUE;
     } else if (minimumCapacity > newCapacity) {
         newCapacity = minimumCapacity;
     }
    value = Arrays.copyOf(value, newCapacity);
}

NOTE: value.length is the capacity of the StringBuffer, not the length.

It has nothing to do with a null string because minimum capacity is 16.

What I think is, the memory allocations cost so much time, and if we are calling ensureCapacity() frequently with increasing minimumCapacity , (capacity +1)*2 will allocate a bit more memory and may reduce further allocations and save some time.

lets consider initial capacity as 16,

only with doubling 16 , 32 , 64 , 128 , 256 , 512 , 1024 , 2048 , so on...

with double +2 16 , 34 , 70 , 142 , 286 , 574 , 1150 , 2302 , so on...

Thus the memory will gradually keeping increasing every time and may decrease the no of allocations of memory.

Outlawry answered 13/5, 2018 at 17:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.