String concatenation with the + symbol
Asked Answered
W

4

3

Today I was reading Antonio's Blog about toString() performance and there is a paragraph:

What used to be considered evil yesterday (“do not concatenate Strings with + !!!“), has become cool and efficient! Today the JVM compiles the + symbol into a string builder (in most cases). So, do not hesitate, use it.

Now I am confused, because he is saying Today the JVM compiles the + symbol into a string builder (in most cases), but I have never heard or seen(code) anything like this before.

Could someone please give example where JVM does this and in what conditions it happens?

Worry answered 24/5, 2017 at 4:58 Comment(8)
possible duplicate: #48105Gubernatorial
@Gubernatorial I am afraid this is not related to above mentioned question. Because in this question he clearly stated that the concat() method only accepts String values while the + operator will silently convert the argument to a String (using the toString() method for objects). However, I am talking about conversion happening into StringBuilder. Please fill me in If something is missing.Worry
@MehrajMalik did you even look at the accepted answer? That´s pretty much a spot on dupeGarbo
@Garbo Yes, I did. He mentioned that StringBuilder conversion is happening behind + operator. However, my question is that does this happen all the time or it needs some specific condition. As stated in blog(In most cases it happens). So what are the cases in which it does not convert into StringBuilder?Worry
DOWNVOTERS, could you please care to tell what is wrong with this question?Worry
Possible duplicate of String concatenation: concat() vs "+" operatorDeserted
@OleV.V. NO, CRYSTAL CLEAR, it's not duplicate.Worry
You may like to see this. [pellegrino.link/2015/08/22/… . Adding string with "+" will give you O(n^2) complexity while StringBuilder's append(String) method will give you O(n) complexityMona
I
14

The rule

“do not concatenate Strings with + !!!“

is wrong, because it is incomplete and therefore misleading.

The rule is

do not concatenate Strings with + in a loop

and that rule still holds. The original rule was never meant to be applied outside of loops!

A simple loop

String s = "";
for (int i = 0; i < 10000; i++) { s += i; }
System.out.println(s);

is still much still much slower than

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++) { sb.append(i); }
System.out.println(sb.toString());

because the Java compiler has to translate the first loop into

String s = "";
for (int i = 0; i < 1000; i++) { s = new StringBuilder(s).append(i).toString(); }
System.out.println(s);

Also the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is misleading at least, because this translation was already done with Java 1.0 (ok, not with StringBuilder but with StringBuffer, because StringBuilder was only added with Java5).


One could also argue that the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is simply wrong, because the compilation is not done by the JVM. It is done by the Java Compiler.


For the question: when does the Java compiler use StringBuilder.append() and when does it use some other mechanism?

The source code of the Java compiler (version 1.8) contains two places where String concatenation through the + operator is handled.

The conclusion is that for the Java compiler from the OpenJDK (which means the compiler distributed by Oracle) the phrase in most cases means always. (Though this could change with Java 9, or it could be that another Java compiler like the one that is included within Eclipse uses some other mechanism).

Inflect answered 24/5, 2017 at 5:22 Comment(6)
"because the compilation is not done by the JVM", haha, nice catch. However, JLS 15.18.1 says that "a Java compiler may use the StringBuffer class". However, it fails to say what is that case when it doesn't use a it.Kutzenco
@ChandlerBing and Thomas(Impressive catch about JVM compilation), Yes, exactly. No one has mentioned what is/are the condition it does not use a StringBuilder for concatenation.Worry
According to docs.oracle.com/javase/8/docs/api/java/lang/String.html "The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method. String conversions are implemented through the method toString, defined by Object and inherited by all classes in Java. For additional information on string concatenation and conversion, see Gosling, Joy, and Steele, The Java Language Specification."Gubernatorial
@MehrajMalik I currently know only of to cases for string concatenation: either both operands are constant strings (and then the compiler replaces it with a string literal) or it at least one operand is not a constant string (and then the compiler uses StringBuilder). But i will try and look into the java compiler for other cases.Beware
Since the concatenation code for non-constant values this is up to the specific compiler, it is impossible to say that all compilers are doing this, as that would imply that we claim to know all compilers. We might say, that all relevant compilers, i.e. javac and ecj, always used the optimization strategy, though. By the way, Java 9’s javac will not use StringBuilder, however, that’s because the new strategy is considered to be even better…Duodecimal
@MehrajMalik I've updated my answer with the information that I've found in the source code of the Java compilerBeware
R
5

Holger is right in his comment that in java-9 + for String concatenation is going to change from a StringBuilder to a strategy chosen by the JRE via invokedynamic. There are 6 strategies that are possible for String concatenation in jdk-9:

  private enum Strategy {
    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder}.
     */
    BC_SB,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but trying to estimate the required storage.
     */
    BC_SB_SIZED,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but computing the required storage exactly.
     */
    BC_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also tries to estimate the required storage.
     */
    MH_SB_SIZED,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also estimate the required storage exactly.
     */
    MH_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that constructs its own byte[] array from
     * the arguments. It computes the required storage exactly.
     */
    MH_INLINE_SIZED_EXACT
}

And the default one is not using a StringBuilder, it is MH_INLINE_SIZED_EXACT. It is actually pretty crazy how the implementation works, and it is trying to be highly optimized.

So, no the advice there as far as I can tell is bad. That by the way is the main effort that was put into by jdk by Aleksey Shipilev. He also added a big change into String internals in jdk-9 as they are now backed by a byte[] instead of char[]. This needed because ISO_LATIN_1 Strings can be encoded in a single byte (one character - one byte) so a lot of less space.

Ruthenious answered 25/5, 2017 at 6:41 Comment(0)
D
4

The statement, in this exact form, is just wrong, and it fits into the picture that the linked blog continues to write nonsense, like that you had to wrap references with Objects.toString(…) to handle null, e.g. "att1='" + Objects.toString(att1) + '\'' instead of just "att1='" + att1 + '\''. There is no need to do that and apparently, the author did never re-check these claims.

The JVM is not responsible for compiling the + operator, as this operator is merely a source code artifact. It’s the compiler, e.g. javac which is responsible, and while there is no guaranty about the compiled form, compilers are encouraged to use a builder by the Java Language Specification:

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

Note that even if a compiler does not perform this optimization, there still is no such thing as a + operator on the byte code level, so the compiler has to pick an operation, a JVM understands, e.g. using String.concat, which might be even faster than using a StringBuilder in the case you’re just concatenating exactly two strings.

Even assuming the worst compilation strategy for string concatenation (still being within the specification), it would be wrong to say to never concatenate strings with +, as when you are defining compile time constants, using + is the only choice, and, of course, a compile-time constant is usually more efficient than using a StringBuilder at runtime.

In practice, the + operator applied to non constant strings was compiled to a StringBuffer usage before Java 5 and to a StringBuilder usage in Java 5 to Java 8. When the compiled code is identical to the manual usage of StringBuffer resp. StringBuilder, there can’t be a performance difference.

The transition to Java 5, more than a decade ago, was the first time, where string concatenation via + had a clear win over manual StringBuffer use, as simply recompiling the concatenation code made it use the potentially faster StringBuilder internally, while the code manually dealing with StringBuffer needed to be rewritten to use StringBuilder, which had been introduced in that version.

Likewise, Java 9 is going to compile the string concatenation using an invokedynamic instruction allowing the JRE to bind it to actual code doing the operation, including optimizations not possible in ordinary Java code. So only recompiling the string concatenation code is needed to get this feature, while there is no equivalent manual usage for it.

That said, while the premise is wrong, i.e. string concatenation never was considered evil, the advice is correct, do not hesitate to use it.

There are only a few cases where you really might improve performance by dealing with a buffer manually, i.e. when you need a large initial capacity or concatenate a lot within loops and that code has been identified as an actual performance bottleneck by a profiling tool

Duodecimal answered 24/5, 2017 at 11:51 Comment(0)
B
0

When you concatenate strings using + operator, compiler translates concatenation code to use StringBuffer for better performance. In order to improve performance StringBuffer is the better choice.

The quickest way of concatenate two string using + operator.

String str = "Java";
str = str + "Tutorial";

The compiler translates this code as:

String s1 = "Java";
StringBuffer sb = new StringBuffer(s1);
sb.append("Tutorial");
s1 = sb.toString();

So it is better to use StringBuffer OR String.format for concatenation

Using String.format

String s = String.format("%s %s", "Java", "Tutorial");
Baten answered 24/5, 2017 at 5:39 Comment(1)
this doesn't answer the question. The OP asks when does the compiler does not translate + into a appendKutzenco

© 2022 - 2024 — McMap. All rights reserved.