Is it better to reuse a StringBuilder in a loop?
Asked Answered
C

15

110

I've a performance related question regarding use of StringBuilder. In a very long loop I'm manipulating a StringBuilder and passing it to another method like this:

for (loop condition) {
    StringBuilder sb = new StringBuilder();
    sb.append("some string");
    . . .
    sb.append(anotherString);
    . . .
    passToMethod(sb.toString());
}

Is instantiating StringBuilder at every loop cycle is a good solution? And is calling a delete instead better, like the following?

StringBuilder sb = new StringBuilder();
for (loop condition) {
    sb.delete(0, sb.length);
    sb.append("some string");
    . . .
    sb.append(anotherString);
    . . .
    passToMethod(sb.toString());
}
Croat answered 28/10, 2008 at 7:9 Comment(0)
U
71

The second one is about 25% faster in my mini-benchmark.

public class ScratchPad {

    static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        for( int i = 0; i < 10000000; i++ ) {
            StringBuilder sb = new StringBuilder();
            sb.append( "someString" );
            sb.append( "someString2"+i );
            sb.append( "someStrin4g"+i );
            sb.append( "someStr5ing"+i );
            sb.append( "someSt7ring"+i );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for( int i = 0; i < 10000000; i++ ) {
            sb.delete( 0, sb.length() );
            sb.append( "someString" );
            sb.append( "someString2"+i );
            sb.append( "someStrin4g"+i );
            sb.append( "someStr5ing"+i );
            sb.append( "someSt7ring"+i );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
    }
}

Results:

25265
17969

Note that this is with JRE 1.6.0_07.


Based on Jon Skeet's ideas in the edit, here's version 2. Same results though.

public class ScratchPad {

    static String a;

    public static void main( String[] args ) throws Exception {
        long time = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for( int i = 0; i < 10000000; i++ ) {
            sb.delete( 0, sb.length() );
            sb.append( "someString" );
            sb.append( "someString2" );
            sb.append( "someStrin4g" );
            sb.append( "someStr5ing" );
            sb.append( "someSt7ring" );
            a = sb.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
        time = System.currentTimeMillis();
        for( int i = 0; i < 10000000; i++ ) {
            StringBuilder sb2 = new StringBuilder();
            sb2.append( "someString" );
            sb2.append( "someString2" );
            sb2.append( "someStrin4g" );
            sb2.append( "someStr5ing" );
            sb2.append( "someSt7ring" );
            a = sb2.toString();
        }
        System.out.println( System.currentTimeMillis()-time );
    }
}

Results:

5016
7516
Utgardloki answered 28/10, 2008 at 7:17 Comment(10)
I've added an edit in my answer to explain why this might be happening. I'll look more carefully in a while (45 mins). Note that doing concatenation in the append calls reduces the point of using StringBuilder in the first place somewhat :)Stork
Also it would be interesting to see what happens if you reverse the two blocks - the JIT is still "warming up" StringBuilder during the first test. It may well be irrelevant, but interesting to try.Stork
I'd still go with the first version because it's cleaner. But it's good that you've actually done the benchmark :) Next suggested change: try #1 with an appropriate capacity passed into the constructor.Stork
Verified, running this benchmark with the tests reversed and several times back to back results in a substantial performance gain (3129ms with reallocation vs. 5903ms for instantiation) after removing concatenation.Littles
Also with 1024 for the constructor and increasing the append operations to perform roughly 1024 characters (less than 1024 so no additional allocation is required) is 5264ms reallocation vs. 13985ms instantiation.Littles
I'd want to see it run with the OP's production data. Does he/she have 200 appends? Are the strings really big? How does this impact the benchmark?Dustin
It is 10 million iterations and instantiations after all; at what point does allocation lose to instantiation?Littles
Use sb.setLength(0); instead, it's the fastest way to empty the contents of StringBuilder against recreating object or using .delete(). Note that this doesn't apply to StringBuffer, its concurrency checks nullify the speed advantage.Lutanist
Inefficient answer. P Arrayah and Dave Jarvis are correct. setLength(0) is far and away the most efficient answer. StringBuilder is backed by a char array and is mutable. At the point .toString() is called, the char array is copied and is used to back an immutable string. At this point, the mutable buffer of StringBuilder can be re-used, simply by moving the insertion pointer back to zero (via .setLength(0)). sb.toString creates yet another copy (the immutable char array), so each iteration requires two buffers as opposed to the .setLength(0) method which only requires one new buffer per loop.Bodyguard
In my test case with half a million iterations and testing with both delete and setLength, the delete loop is beating setLength every time. @Chris, your comment makes complete sense, but the results say otherwise on my end.Sauerbraten
I
28

Faster still:

public class ScratchPad {

    private static String a;

    public static void main( String[] args ) throws Exception {
        final long time = System.currentTimeMillis();

        // Pre-allocate enough space to store all appended strings.
        // StringBuilder, ultimately, uses an array of characters.
        final StringBuilder sb = new StringBuilder( 128 );

        for( int i = 0; i < 10000000; i++ ) {
            // Resetting the string is faster than creating a new object.
            // Since this is a critical loop, every instruction counts.
            sb.setLength( 0 );
            sb.append( "someString" );
            sb.append( "someString2" );
            sb.append( "someStrin4g" );
            sb.append( "someStr5ing" );
            sb.append( "someSt7ring" );
            setA( sb.toString() );
        }

        System.out.println( System.currentTimeMillis() - time );
    }

    private static void setA( final String aString ) {
        a = aString;
    }
}

In the philosophy of writing solid code, the inner workings of the method are hidden from the client objects. Thus it makes no difference from the system's perspective whether you re-declare the StringBuilder within the loop or outside of the loop. Since declaring it outside of the loop is faster, and it does not make the code significantly more complicated, reuse the object.

Even if it was much more complicated, and you knew for certain that object instantiation was the bottleneck, comment it.

Three runs with this answer:

$ java ScratchPad
1567
$ java ScratchPad
1569
$ java ScratchPad
1570

Three runs with the other answer:

$ java ScratchPad2
1663
2231
$ java ScratchPad2
1656
2233
$ java ScratchPad2
1658
2242

Although not significant, setting the StringBuilder's initial buffer size, to prevent memory re-allocations, will give a small performance gain.

Ilmenite answered 22/6, 2009 at 6:20 Comment(1)
This is by far the best answer. StringBuilder is backed by a char array and is mutable. At the point .toString() is called, the char array is copied and is used to back an immutable string. At this point, the mutable buffer of StringBuilder can be re-used, simply by moving the insertion pointer back to zero (via .setLength(0)). Those answers suggesting allocating a brand new StringBuilder per loop do not seem to realise that .toString creates yet another copy, so each iteration requires two buffers as opposed to the .setLength(0) method which only requires one new buffer per loop.Bodyguard
C
25

In the philosophy of writing solid code its always better to put your StringBuilder inside your loop. This way it doesnt go outside the code its intended for.

Secondly the biggest improvment in StringBuilder comes from giving it an initial size to avoid it growing bigger while the loop runs

for (loop condition) {
  StringBuilder sb = new StringBuilder(4096);
}
Chrysalid answered 28/10, 2008 at 7:14 Comment(4)
You could always scope the whole thing with curly brackets, that way you don't have the Stringbuilder outside.Utgardloki
@Epaga: It's still outside the loop itself. Yes, it doesn't pollute the outer scope, but it's an unnatural way to write the code for a performance improvement which hasn't been verified in context.Stork
Or even better, put the whole thing in its own method. ;-) But I hear ya re: context.Utgardloki
Better yet initialize with the expected size instead of sum arbitrary number (4096) Your code may return a String that references a char[] of size 4096 (depends on the JDK; as far as I remember that was the case for 1.4 )Antonetta
S
12

Okay, I now understand what's going on, and it does make sense.

I was under the impression that toString just passed the underlying char[] into a String constructor which didn't take a copy. A copy would then be made on the next "write" operation (e.g. delete). I believe this was the case with StringBuffer in some previous version. (It isn't now.) But no - toString just passes the array (and index and length) to the public String constructor which takes a copy.

So in the "reuse the StringBuilder" case we genuinely create one copy of the data per string, using the same char array in the buffer the whole time. Obviously creating a new StringBuilder each time creates a new underlying buffer - and then that buffer is copied (somewhat pointlessly, in our particular case, but done for safety reasons) when creating a new string.

All this leads to the second version definitely being more efficient - but at the same time I'd still say it's uglier code.

Stork answered 28/10, 2008 at 8:18 Comment(2)
Just some funny info about the .NET, there situation is different. The .NET StringBuilder internally modifies regular "string" object and toString method simply returns it (marking it as non-modifiable, so consequent StringBuilder manipulations will re-create it). So, typical "new StringBuilder->modify it->to String" sequence will not make any extra copy (only for expanding the storage or shrinking it, if resulting string length is much shorter than its capacity). In Java this cycle always makes at least one copy (in StringBuilder.toString()).Rivarivage
The Sun JDK pre-1.5 had the optimization you were assuming: bugs.sun.com/bugdatabase/view_bug.do?bug_id=6219959Cynthy
T
10

Since I don't think it's been pointed out yet, because of optimizations built into the Sun Java compiler, which automatically creates StringBuilders (StringBuffers pre-J2SE 5.0) when it sees String concatenations, the first example in the question is equivalent to:

for (loop condition) {
  String s = "some string";
  . . .
  s += anotherString;
  . . .
  passToMethod(s);
}

Which is more readable, IMO, the better approach. Your attempts to optimize may result in gains in some platform, but potentially losses others.

But if you really are running into issues with performance, then sure, optimize away. I'd start with explicitly specifying the buffer size of the StringBuilder though, per Jon Skeet.

Tillett answered 30/10, 2008 at 13:7 Comment(0)
D
6

The modern JVM is really smart about stuff like this. I would not second guess it and do something hacky that is less maintainable/readable...unless you do proper bench marks with production data that validate a non-trivial performance improvement (and document it ;)

Dustin answered 28/10, 2008 at 7:12 Comment(7)
Where "non-trivial" is key - benchmarks can show one form being proportionally faster, but with no hint about how much time that's taking in the real app :)Stork
See the benchmark in my answer below. The second way is faster.Utgardloki
@Epaga: Your benchmark says little about the performance improvement in the real app, where the time taken to do the StringBuilder allocation may be trivial compared with the rest of the loop. That's why context is important in benchmarking.Stork
@Jon I understand, but I*m assuming that if his whole question is geared towards which one has a higher performance, that a 25-50% difference IS important and that that part of his code will be called many times.Utgardloki
@Epaga: Until he's measured it with his real code, we'll have no clue how significant it really is. If there's a lot of code for each iteration of the loop, I strongly suspect it'll still be irrelevant. We don't know what's in the "..."Stork
(Don't get me wrong, btw - your benchmark results are still very interesting in themselves. I'm fascinated by microbenchmarks. I just don't like bending my code out of shape before performing real-life tests as well.)Stork
wise words, i think we both fully agree. :-)Utgardloki
L
4

Based on my experience with developing software on Windows I would say clearing the StringBuilder out during your loop has better performance than instantiating a StringBuilder with each iteration. Clearing it frees that memory to be overwritten immediately with no additional allocation required. I'm not familiar enough with the Java garbage collector, but I would think that freeing and no reallocation (unless your next string grows the StringBuilder) is more beneficial than instantiation.

(My opinion is contrary to what everyone else is suggesting. Hmm. Time to benchmark it.)

Littles answered 28/10, 2008 at 7:17 Comment(3)
The thing is that more memory has to be reallocated anyway, as the existing data is being used by the newly created String at the end of the previous loop iteration.Stork
Oh that makes sense, I had though that toString was allocating and returning a new string instance and the byte buffer for the builder was clearing instead of re-allocating.Littles
Epaga's benchmark shows that clearing and re-using is a gain over instantiation at every pass.Littles
P
1

The reason why doing a 'setLength' or 'delete' improves the performance is mostly the code 'learning' the right size of the buffer, and less to do the memory allocation. Generally, I recommend letting the compiler do the string optimizations. However, if the performance is critical, I'll often pre-calculate the expected size of the buffer. The default StringBuilder size is 16 characters. If you grow beyond that, then it has to resize. Resizing is where the performance is getting lost. Here's another mini-benchmark which illustrates this:

private void clear() throws Exception {
    long time = System.currentTimeMillis();
    int maxLength = 0;
    StringBuilder sb = new StringBuilder();

    for( int i = 0; i < 10000000; i++ ) {
        // Resetting the string is faster than creating a new object.
        // Since this is a critical loop, every instruction counts.
        //
        sb.setLength( 0 );
        sb.append( "someString" );
        sb.append( "someString2" ).append( i );
        sb.append( "someStrin4g" ).append( i );
        sb.append( "someStr5ing" ).append( i );
        sb.append( "someSt7ring" ).append( i );
        maxLength = Math.max(maxLength, sb.toString().length());
    }

    System.out.println(maxLength);
    System.out.println("Clear buffer: " + (System.currentTimeMillis()-time) );
}

private void preAllocate() throws Exception {
    long time = System.currentTimeMillis();
    int maxLength = 0;

    for( int i = 0; i < 10000000; i++ ) {
        StringBuilder sb = new StringBuilder(82);
        sb.append( "someString" );
        sb.append( "someString2" ).append( i );
        sb.append( "someStrin4g" ).append( i );
        sb.append( "someStr5ing" ).append( i );
        sb.append( "someSt7ring" ).append( i );
        maxLength = Math.max(maxLength, sb.toString().length());
    }

    System.out.println(maxLength);
    System.out.println("Pre allocate: " + (System.currentTimeMillis()-time) );
}

public void testBoth() throws Exception {
    for(int i = 0; i < 5; i++) {
        clear();
        preAllocate();
    }
}

The results show reusing the object is about 10% faster than creating a buffer of the expected size.

Polenta answered 16/7, 2009 at 1:22 Comment(0)
G
1

LOL, first time i ever seen people compared the performance by combining string in StringBuilder. For that purpose, if you use "+", it could be even faster ;D. The purpose of using StringBuilder to speed up for retrieval of the whole string as the concept of "locality".

In the scenario that you retrieve a String value frequently that does not need frequent change, Stringbuilder allows higher performance of string retrieval. And that is the purpose of using Stringbuilder.. please do not MIS-Test the core purpose of that..

Some people said, Plane flies faster. Therefore, i test it with my bike, and found that the plane move slower. Do you know how i set the experiment settings ;D

Glaikit answered 17/12, 2010 at 5:5 Comment(0)
G
1

Not significantly faster, but from my tests it shows on average to be a couple millis faster using 1.6.0_45 64 bits: use StringBuilder.setLength(0) instead of StringBuilder.delete():

time = System.currentTimeMillis();
StringBuilder sb2 = new StringBuilder();
for (int i = 0; i < 10000000; i++) {
    sb2.append( "someString" );
    sb2.append( "someString2"+i );
    sb2.append( "someStrin4g"+i );
    sb2.append( "someStr5ing"+i );
    sb2.append( "someSt7ring"+i );
    a = sb2.toString();
    sb2.setLength(0);
}
System.out.println( System.currentTimeMillis()-time );
Grandfather answered 5/6, 2013 at 13:53 Comment(0)
N
1

The fastest way is to use "setLength". It won't involve the copying operation. The way to create a new StringBuilder should be completely out. The slow for the StringBuilder.delete(int start, int end) is because it will copy the array again for the resizing part.

 System.arraycopy(value, start+len, value, start, count-end);

After that, the StringBuilder.delete() will update the StringBuilder.count to the new size. While the StringBuilder.setLength() just simplify update the StringBuilder.count to the new size.

Nun answered 29/9, 2013 at 7:3 Comment(0)
H
0

The first is better for humans. If the second is a bit faster on some versions of some JVMs, so what?

If performance is that critical, bypass StringBuilder and write your own. If you're a good programmer, and take into account how your app is using this function, you should be able to make it even faster. Worthwhile? Probably not.

Why is this question stared as "favorite question"? Because performance optimization is so much fun, no matter whether it is practical or not.

Hijacker answered 30/10, 2008 at 0:32 Comment(2)
It isn't an academic question only. While most of the times (read 95%) I prefer readability and maintainability, there are really cases that little improvements makes big differences...Croat
OK, I'll change my answer. If an object provides a method that allows it to be cleared and reused, then do so. Examine the code first if you want to make sure the clear is efficient; maybe it releases a private array! If efficient, then allocate the object outside the loop and reuse it inside.Hijacker
H
0

I don't think that it make sence to try to optimize performance like that. Today (2019) the both statments are running about 11sec for 100.000.000 loops on my I5 Laptop:

    String a;
    StringBuilder sb = new StringBuilder();
    long time = 0;

    System.gc();
    time = System.currentTimeMillis();
    for (int i = 0; i < 100000000; i++) {
        StringBuilder sb3 = new StringBuilder();
        sb3.append("someString");
        sb3.append("someString2");
        sb3.append("someStrin4g");
        sb3.append("someStr5ing");
        sb3.append("someSt7ring");
        a = sb3.toString();
    }
    System.out.println(System.currentTimeMillis() - time);

    System.gc();
    time = System.currentTimeMillis();
    for (int i = 0; i < 100000000; i++) {
        sb.setLength(0);
        sb.delete(0, sb.length());
        sb.append("someString");
        sb.append("someString2");
        sb.append("someStrin4g");
        sb.append("someStr5ing");
        sb.append("someSt7ring");
        a = sb.toString();
    }
    System.out.println(System.currentTimeMillis() - time);

==> 11000 msec(declaration inside loop) and 8236 msec(declaration outside loop)

Even if I'am running programms for address dedublication with some billion loops a difference of 2 sec. for 100 million loops does not make any difference because that programs are running for hours. Also be aware that things are different if you only have one append statement:

    System.gc();
    time = System.currentTimeMillis();
    for (int i = 0; i < 100000000; i++) {
        StringBuilder sb3 = new StringBuilder();
        sb3.append("someString");
            a = sb3.toString();
    }
    System.out.println(System.currentTimeMillis() - time);

    System.gc();
    time = System.currentTimeMillis();
    for (int i = 0; i < 100000000; i++) {
        sb.setLength(0);
        sb.delete(0, sb.length());
        sb.append("someString");
        a = sb.toString();
    }
    System.out.println(System.currentTimeMillis() - time);

==> 3416 msec(inside loop), 3555 msec(outside loop) The first statement which is creating the StringBuilder within the loop is faster in that case. And, if you change the order of execution it is much more faster:

    System.gc();
    time = System.currentTimeMillis();
    for (int i = 0; i < 100000000; i++) {
        sb.setLength(0);
        sb.delete(0, sb.length());
        sb.append("someString");
        a = sb.toString();
    }
    System.out.println(System.currentTimeMillis() - time);

    System.gc();
    time = System.currentTimeMillis();
    for (int i = 0; i < 100000000; i++) {
        StringBuilder sb3 = new StringBuilder();
        sb3.append("someString");
            a = sb3.toString();
    }
    System.out.println(System.currentTimeMillis() - time);

==> 3638 msec(outside loop), 2908 msec(inside loop)

Regards, Ulrich

Hofuf answered 22/10, 2019 at 11:41 Comment(0)
I
0

The practice of not recreating so many new objects in a tight loop, where easily avoidable, definitely has a clear and obvious benefit as shown by the performance benchmarks.

However it also has a more subtle benefit that no one has mentioned.

This secondary benefit is related to an application freeze I saw in a large app processing the persistent objects produced after parsing CSV files with millions of lines/records and each record having about 140 fields.

Creating a new object here and there doesn't normally affect the garbage collector's workload.

Creating two new objects in a tight loop that iterates through each of the 140 fields in each of the millions of records in the aforementioned app incurs more than just mere wasted CPU cycles. It places a massive burden on the GC.

For the objects created by parsing a CSV file with 10 million lines the GC was being asked to allocate then clean up 2 x 140 x 10,000,000 = 2.8 billion objects!!!

If at any stage of the amount of free memory gets scarce e.g. the app has been asked to process multiple large files simultaneously, then you run the risk that the app ends up doing way more GC'ing than actual real work. When the GC effort takes up more than 98% of the CPU time then BANG! You get one of these dreaded exceptions:

GC Overhead Limit Exceeded

https://www.baeldung.com/java-gc-overhead-limit-exceeded

In that case rewriting the code to reuse objects like the StringBuilder instead of instantiating a new one at each iteration can really avoid a lot of GC activity (by not instantiating an extra 2.8 billion objects unnecessarily), reduce the chance of it throwing a "GC Overhead Limit Exceeded" exception and drastically improve the app's general performance even when it is not yet tight on memory.

Clearly, "leaving to the JVM to optimize" can not be a "rule of thumb" applicable to all scenarios.

With the sort of metrics associated with known large input files nobody who writes code to avoid the unnecessary creation of 2.8 billion objects should ever be accused by the "puritanicals" of "Pre-Optimizing" ;)

Any dev with half a brain and the slightest amount of foresight could see that this type of optimization for the expected input file size was warranted from day one.

Indecent answered 12/5, 2022 at 5:4 Comment(0)
F
-2

Declare once, and assign each time. It is a more pragmatic and reusable concept than an optimization.

Franci answered 28/10, 2008 at 13:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.