How Java do the string concatenation using "+"?
Asked Answered
S

6

12

I read about the way Java works with += operator, using StringBuilder.
Is it the same with a ("a" + "b") operation?

Singletary answered 27/4, 2010 at 14:23 Comment(1)
I recommend you to have a look at this excellent article.Pulsar
E
14

No. It's not the same using StringBuilder than doing "a" + "b".

In Java, String instances are immutable.

So, if you do:

String c = "a" + "b";

You are creating new Strings every time you concatenate.

On the other hand, StringBuilder is like a buffer that can grow as it needs when appending new Strings.

StringBuilder c = new StringBuilder();
c.append("a");
c.append("b"); // c is only created once and appended "a" and "b".

Rule of the thumb is (changed thanks to the comments I got):

If you are going to concatenate a lot (i.e., concatenate inside a loop, or generating a big XML formed by several string concatenated variables), do use StringBuilder. Otherwise, simple concatenation (using + operator) will be just fine.

Compiler optimizations also play a huge role when compiling this kind of code.

Here'sfurther explanation on the topic.

And more StackOVerflow questions on the issue:

Is it better to reuse a StringBuilder in a loop?

What's the best way to build a string of delimited items in Java?

StringBuilder vs String concatenation in toString() in Java

Eoin answered 27/4, 2010 at 14:25 Comment(9)
It's not guaranteed that you're creating new Strings every time you concatenate; it's actually up to the compiler. A poor-quality compiler would behave as you describe...Descend
Not necessarily. The compiler will optimize it anyway. Change your rule of thumb to: "if you're going to use stringA += stringB, then use StringBuilder instead, because it would indeed eat heap. Your answer is more implying that you need a StringBuilder for every stringA + stringB, this is not true.Kassie
In fact it is required by the JLS that concatenation of compile time constants results in an interned String. Rule of thumb should be that if you are going to create a String in a loop and there is any chance at all that it'll loop a number of times, the use a StringBuilder. No point in complicating linear concatenation code with StirngBuilder.Borchert
Not only do String literals concatenated with + get changed to a single string at compile time (in your example, the compiler would simplify it to String c = "ab";), the compiler can also optimize concatenation with + to use a StringBuilder anyway. You should only really need to use StringBuilder for more complex appending, such as when appending in a loop.Subroutine
Agree. With all of you. Just tried to keep it simple. I think the questions asks for general a general answer, an answer including cases where "a" + "b" can also be strA + strB.Eoin
@ColinD: It can only "optimize" it within very narrow constraints. For instance, a concatenation loop will result in a new string on every loop; the StringBuilder is not used optimally automatically. Of course, it only really matters for very large loops or loops involving very large strings.Salver
@T.J. Crowder Yes, I know it doesn't do good things with loops, which is why I mentioned them as a case for using StringBuilder directly.Subroutine
-1: The more I look at your answer, the more confused I get. The cases for + are literal,literal and something with non-literals in. The compiler can (and should) optimize the literals-only case (but doesn't in some cases, which is a compiler-quality issue) and in the other case issues a StringBuilder (or StringBuffer in early versions of Java) internally. In complex cases (e.g., concat in a loop) it's better to write everything out by hand.Descend
(The compiler quality issue is really a failure to take advantage of the associativity of + BTW)Descend
S
39

If you combine literal strings (literally "foo" + "bar"), the compiler does it at compile-time, not at runtime.

If you have two non-literal strings and join them with +, the compiler (Sun's, anyway) will use a StringBuilder under the covers, but not necessarily in the most efficient way. So for instance, if you have this:

String repeat(String a, int count) {
    String rv;

    if (count <= 0) {
        return "";
    }

    rv = a;
    while (--count > 0) {
        rv += a;
    }
    return rv;
}

...what the Sun compiler will actually produce as bytecode looks something like this:

String repeat(String a, int count) {
    String rv;

    if (count <= 0) {
        return "";
    }

    rv = a;
    while (--count > 0) {
        rv = new StringBuilder().append(rv).append(a).toString();
    }
    return rv;
}

(Yes, really — see the disassembly at the end of this answer.) Note that it created a new StringBuilder on every iteration, and then converted the result to String. This is inefficient (but it doesn't matter unless you're doing it a lot) because of all of the temporary memory allocations: It allocates a StringBuilder and its buffer, quite possibly reallocates the buffer on the first append [if rv is more than 16 characters long, which is the default buffer size] and if not on the first then almost certainly on the second append, then allocates a String at the end — and then does it all again on the next iteration.

You could gain efficiency, if necessary, by rewriting it to explicitly use a StringBuilder:

String repeat(String a, int count) {
    StringBuilder rv;

    if (count <= 0) {
        return "";
    }

    rv = new StringBuilder(a.length() * count);
    while (count-- > 0) {
        rv.append(a);
    }
    return rv.toString();
}

There we've used an explicit StringBuilder and also set its initial buffer capacity to be large enough to hold the result. That's more memory-efficient, but of course, marginally less clear to inexperienced code maintainers and marginally more of a pain to write. So if you find a performance issue with a tight string concat loop, this might be a way to address it.

You can see this under-the-covers StringBuilder in action with the following test class:

public class SBTest
{
    public static final void main(String[] params)
    {
        System.out.println(new SBTest().repeat("testing ", 4));
        System.exit(0);
    }

    String repeat(String a, int count) {
        String rv;

        if (count <= 0) {
            return "";
        }

        rv = a;
        while (--count > 0) {
            rv += a;
        }
        return rv;
    }
}

...which disassembles (using javap -c SBTest) like this:

Compiled from "SBTest.java"
public class SBTest extends java.lang.Object{
public SBTest();
Code:
   0: aload_0
   1: invokespecial  #1; //Method java/lang/Object."<init>":()V
   4: return

public static final void main(java.lang.String[]);
Code:
   0: getstatic   #2; //Field java/lang/System.out:Ljava/io/PrintStream;
   3: new   #3; //class SBTest
   6: dup
   7: invokespecial  #4; //Method "<init>":()V
   10: ldc   #5; //String testing
   12: iconst_4
   13: invokevirtual  #6; //Method repeat:(Ljava/lang/String;I)Ljava/lang/String;
   16: invokevirtual  #7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   19: iconst_0
   20: invokestatic   #8; //Method java/lang/System.exit:(I)V
   23: return

java.lang.String repeat(java.lang.String, int);
Code:
   0: iload_2
   1: ifgt  7
   4: ldc   #9; //String
   6: areturn
   7: aload_1
   8: astore_3
   9: iinc  2, -1
   12: iload_2
   13: ifle  38
   16: new   #10; //class java/lang/StringBuilder
   19: dup
   20: invokespecial  #11; //Method java/lang/StringBuilder."<init>":()V
   23: aload_3
   24: invokevirtual  #12; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   27: aload_1
   28: invokevirtual  #12; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   31: invokevirtual  #13; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   34: astore_3
   35: goto  9
   38: aload_3
   39: areturn

}

Note how a new StringBuilder is created on each iteration of the loop and created using the default buffer capacity.

All of this temporary allocation stuff sounds ugly, but again, only if you're dealing with substantial loops and/or substantial strings. Also, when the resulting bytecode is run, the JVM may well optimize it further. Sun's HotSpot JVM, for instance, is a very mature JIT optimizing compiler. Once it's identified the loop as a hot spot, it may well find a way to refactor it. Or not, of course. :-)

My rule of thumb is I worry about it when I see a performance problem, or if I know I'm doing a lot of concatenation and it's very likely to be a performance problem and the code won't be significantly impacted from a maintainability standpoint if I use a StringBuilder instead. The rabid anti-premature-optimization league would probably disagree with me on the second of those. :-)

Salver answered 27/4, 2010 at 14:31 Comment(2)
@Tom Brito: Actually, based on the question, I'm not sure whether the one-word answer would be "yes" or "no" so I took that out and just explained what goes on. :-)Salver
@T.J.Crowder you rock!!(I know its not a recommended comment... I could not resist :P)Incorruption
E
14

No. It's not the same using StringBuilder than doing "a" + "b".

In Java, String instances are immutable.

So, if you do:

String c = "a" + "b";

You are creating new Strings every time you concatenate.

On the other hand, StringBuilder is like a buffer that can grow as it needs when appending new Strings.

StringBuilder c = new StringBuilder();
c.append("a");
c.append("b"); // c is only created once and appended "a" and "b".

Rule of the thumb is (changed thanks to the comments I got):

If you are going to concatenate a lot (i.e., concatenate inside a loop, or generating a big XML formed by several string concatenated variables), do use StringBuilder. Otherwise, simple concatenation (using + operator) will be just fine.

Compiler optimizations also play a huge role when compiling this kind of code.

Here'sfurther explanation on the topic.

And more StackOVerflow questions on the issue:

Is it better to reuse a StringBuilder in a loop?

What's the best way to build a string of delimited items in Java?

StringBuilder vs String concatenation in toString() in Java

Eoin answered 27/4, 2010 at 14:25 Comment(9)
It's not guaranteed that you're creating new Strings every time you concatenate; it's actually up to the compiler. A poor-quality compiler would behave as you describe...Descend
Not necessarily. The compiler will optimize it anyway. Change your rule of thumb to: "if you're going to use stringA += stringB, then use StringBuilder instead, because it would indeed eat heap. Your answer is more implying that you need a StringBuilder for every stringA + stringB, this is not true.Kassie
In fact it is required by the JLS that concatenation of compile time constants results in an interned String. Rule of thumb should be that if you are going to create a String in a loop and there is any chance at all that it'll loop a number of times, the use a StringBuilder. No point in complicating linear concatenation code with StirngBuilder.Borchert
Not only do String literals concatenated with + get changed to a single string at compile time (in your example, the compiler would simplify it to String c = "ab";), the compiler can also optimize concatenation with + to use a StringBuilder anyway. You should only really need to use StringBuilder for more complex appending, such as when appending in a loop.Subroutine
Agree. With all of you. Just tried to keep it simple. I think the questions asks for general a general answer, an answer including cases where "a" + "b" can also be strA + strB.Eoin
@ColinD: It can only "optimize" it within very narrow constraints. For instance, a concatenation loop will result in a new string on every loop; the StringBuilder is not used optimally automatically. Of course, it only really matters for very large loops or loops involving very large strings.Salver
@T.J. Crowder Yes, I know it doesn't do good things with loops, which is why I mentioned them as a case for using StringBuilder directly.Subroutine
-1: The more I look at your answer, the more confused I get. The cases for + are literal,literal and something with non-literals in. The compiler can (and should) optimize the literals-only case (but doesn't in some cases, which is a compiler-quality issue) and in the other case issues a StringBuilder (or StringBuffer in early versions of Java) internally. In complex cases (e.g., concat in a loop) it's better to write everything out by hand.Descend
(The compiler quality issue is really a failure to take advantage of the associativity of + BTW)Descend
D
6

Yes, it's the same, but the compiler can additionally optimize concatenations of literals before issuing the code, so "a"+"b" can be just issued as "ab" directly.

Descend answered 27/4, 2010 at 14:26 Comment(2)
nope they're not the same; how can they be - concatenation with immutable strings & concatenation with string builer ? @Pablo Santa Cruz provides a worthy answer.Epilogue
@phoenix24: Actually, he spouts stupid rubbish precisely because he doesn't differentiate between literal concatenation and non-literal concat.Descend
E
4

For concatenating a fixed number of strings in one expression with +, the compiler will produce code using a single StringBuilder.

E.g. the line

String d = a + b + c;

results in the same bytecode as the line

String d = new StringBuilder().append(a).append(b).append(c).toString();

when compiled using the javac compiler. (The Eclipse compiler produces somewhat more optimized code by invoking new StringBuilder(a), thus saving one method call.)

As mentioned in other answers, the compiler will concatenate string literals like "a" + "b" into one string itself, producing bytecode that contains "ab" instead.

As mentioned everywhere on the net, you should not use + to build up one string within a loop, because you are copying the beginning of the string over and over to new strings. In this situation you should use one StringBuilder which you declare outside the loop.

Erethism answered 27/4, 2010 at 14:47 Comment(0)
N
0

"a" + "b" operation

Though readable, easy to format and straight forward, concatenating strings with "+" is considered to be bad in Java.

Each time you append something via '+' (String.concat()) a new String is created, the old String content is copied, the new content is appended, and the old String is discarded. The bigger the String gets the longer it takes - there is more to copy and more garbage is produced. Note: if you are just concatenating a few (say 3,4) strings and not building a string via a loop or just writing some test application, you could still stick with "+"

Using StringBuilder

When performing extensive String manipulation (or appending through a loop), replacing "+" with StringBuilder.append is likely recommended. The intermediate objects mentioned in case of "+" are not created during append() method call.

Also to be noted that optimizations in the Sun Java compiler, which automatically creates StringBuilders (StringBuffers < 5.0) when it sees String concatenations. But that is just Sun Java compiler.

Nympholepsy answered 27/4, 2010 at 14:57 Comment(0)
W
-2

Strings are more commonly concatenated with the + operator, as in "Hello," + " world" + "!"

Source

Wild answered 27/4, 2010 at 14:26 Comment(1)
Yes, but I think he means, how does the compiler do that operation. (And the answer is indeed that it uses a StringBuilder under-the-covers.)Salver

© 2022 - 2024 — McMap. All rights reserved.