How much does Java optimize string concatenation with +?
Asked Answered
P

2

12

I know that in more recent Java versions string concatenation

String test = one + "two"+ three;

Will get optimized to use a StringBuilder.

However will a new StringBuilder be generated each time it hits this line or will a single Thread Local StringBuilder be generated that is then used for all string concatenation?

In other words can I improve on the performance for a frequently called method by creating my own thread local StringBuilder to re-use or will there be no significant gains by doing so?

I can just write a test for this but I wonder if it might be compiler/JVM specific or something that can be answered more generally?

Passionate answered 26/10, 2016 at 16:29 Comment(10)
Beware of reentrancy when concatenating expressions.Soulier
Last I checked it was quite dumb, forcing StringBuilder to repeatedly reallocate. But that was specific to Oracle's JDK, looking at the resulting bytecode, and so didn't account for any optimization the JVM might do. My rule was: 99.999% of the time you don't care, of course; for the .001% where you care, use an explicit StringBuilder allocated big enough to handle the total result.Financier
Unless you're doing a lot more string manipulation than just that one line, I agree with T.J.: 99.999% of the time you won't see any difference. The JVM will actually allocate all memory as local to a thread anyway (until it needs to share with another thread), iiuc, so your thread local probably won't do any good.Carri
If it's on a single statement like "a"+"b"+"c" I believe it pre-allocates the correct amount. In general it will work great. The case to use StringBuffer/Builder manually is when you are appending to a single string in a loop--Java will repeatedly create and destroy builders and garbage collect the intermediate strings which isn't great.Bump
By the way, the referenced question is NOT a good replacement for this question because when searching for "Optimizing String +" you'd never come up with the other question, but there are duplicates out there--you should have at least found #1532961Bump
Which is not a duplicate either. As my question shows I am aware of string concatenation. I'm specifically asking about whether string buffers get reusedPassionate
@TimB No, but it's generally not a good idea to do that anyway. Just doing the straightforward thing -- which Java does -- is likely to perform better in the vast majority of cases.Battery
It might be interesting to write several examples, and then use a decompiler on the resulting class files to see what really shakes out.Sechrist
blog.codinghorror.com/… -- compares various string concatenation techniques, and gives an amusing conclusion.Illation
If I have to concatenate strings in a loop, or something similar, I usually allocate a StringBuilder manually (strictly as a local variable). But if the code is just a single line, concatenating 2-3-4 values, I wouldn't bother.Inelegance
S
10

As far as I know, there is no compiler generating code reusing StringBuilder instances, most notably javac and ECJ don’t generate reusing code.

It’s important to emphasize that it is reasonable not to do such re-use. It’s not safe to assume that code retrieving an instance from a ThreadLocal variable is faster than a plain allocation from a TLAB. Even by trying to add the potential costs of a local gc cycle for reclaiming that instance, as far as we can identify its fraction on the costs, we can’t conclude that.

So the code trying to reuse the builder would be more complicated, wasting memory, as it keeps the builder alive without knowing whether it ever will be actually reused, without a clear performance benefit.

Especially when we consider that additionally to the statement above

  • JVMs like HotSpot have Escape Analysis, which can elide pure local allocations like these altogether and also may elide the copying costs of array resize operations
  • Such sophisticated JVMs usually also have optimizations dedicated specifically to StringBuilder based concatenation, which work best when the compiled code follows the common pattern

With Java 9, the picture is going to change again. Then, string concatenation will get compiled to an invokedynamic instruction which will get linked to a JRE provided factory at runtime (see StringConcatFactory). Then, the JRE will decide how the code will look like, which allows to tailor it to the specific JVM, including buffer re-use, if it has a benefit on that particular JVM. This will also reduce the code size, as it requires only a single instruction rather than the sequence of an allocation and multiple calls into the StringBuilder.

Slifka answered 8/2, 2017 at 17:1 Comment(1)
with jdk-9 the pictures changes dramatically again :)Kef
K
11

You would be amazed how much effort was put into jdk-9 String concatenation. First javac emits an invokedynamic instead of an invocation to StringBuilder#append. That invokedynamic will return a CallSite with contains a MethodHandle (that is actually a series of MethodHandles).

Thus the decision of what is actually done for a String concatenation is moved to the runtime. The downside is that the first time you concatenate Strings that is going to be slower (for the same type of arguments).

Then there are a series of strategies you can choose from when concatenating a String(you can override the default one via java.lang.invoke.stringConcat parameter):

private enum Strategy {
    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder}.
     */
    BC_SB,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but trying to estimate the required storage.
     */
    BC_SB_SIZED,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but computing the required storage exactly.
     */
    BC_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also tries to estimate the required storage.
     */
    MH_SB_SIZED,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also estimate the required storage exactly.
     */
    MH_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that constructs its own byte[] array from
     * the arguments. It computes the required storage exactly.
     */
    MH_INLINE_SIZED_EXACT
}

The default strategy is: MH_INLINE_SIZED_EXACT which is a beast!

It uses the package-private constructor to build the String (which is the fastest):

/*
 * Package private constructor which shares value array for speed.
 */
String(byte[] value, byte coder) {
    this.value = value;
    this.coder = coder;
}

First this strategy creates so called filters; these are basically method handles that would transform the incoming parameter to a String value. As one might expect, these MethodHandles are stored in a class called Stringifiers that in most cases produce a MethodHandle that calls:

String.valueOf(YourInstance)

So if you have 3 Objects that you want to concatenate there will be 3 MethodHandles that will delegate to String.valueOf(YourObject) which effectively means that you have transformed your objects into Strings. There are certain tweaks inside this class that I still can't understand; like the need to have separate classes StringifierMost (that transforms to String only References, float and doubles) and StringifierAny.

Since the MH_INLINE_SIZED_EXACT says that the byte array is computed to exact size; there is a way to compute that.

The way this is done is via methods in StringConcatHelper#mixLen which take Stringified version of your input parameters (References/float/double). At this point we know the size of our final String. Well, we don't actually know it, we have a MethodHandle that will compute it.

There's one more change in String jdk-9 that is worth mentioning here - addition of a coder field. This is needed to compute the size/equality/charAt of a String. Since it's needed for the size, we need to compute it also; this is done via StringConcatHelper#mixCoder.

It is safe at this point to delegate a MethodHandle that will create ur array:

    @ForceInline
    private static byte[] newArray(int length, byte coder) {
        return (byte[]) UNSAFE.allocateUninitializedArray(byte.class, length << coder);
    }

How is each element appended? Via methods in StringConcatHelper#prepend.

And only now we need all the details needed to invoke that constructor of the String that takes a byte.


All these operations (and many others I have skipped for simplicity) are handled via emitting a MethodHandle that will be invoked when the appending actually happens.

Kef answered 9/2, 2017 at 13:48 Comment(4)
I got carried away a bit in this answer for the simple reason that the details of such a simple operation are fascinating IMO.Kef
It is really interesting, although unfortunately it doesn't really directly answer the question - so I feel like I can't move the tick over :(Passionate
@TimB totally agree, this is not about the tick :). the accepted answer is the correct one.Kef
To be precise in the terms, the CallSite is the object returned by the bootstrap method, it encapsulates a MethodHandle to which the invokedynamic instruction gets linked to. What the invokedynamic instruction returns, is, of course, a String. What is mentioned only implicitly here, is that String now uses a byte[] array,` encoding iso-latin-1 strings using only one byte per character, which alone is already halving the necessary data movement for most string constructions.Slifka
S
10

As far as I know, there is no compiler generating code reusing StringBuilder instances, most notably javac and ECJ don’t generate reusing code.

It’s important to emphasize that it is reasonable not to do such re-use. It’s not safe to assume that code retrieving an instance from a ThreadLocal variable is faster than a plain allocation from a TLAB. Even by trying to add the potential costs of a local gc cycle for reclaiming that instance, as far as we can identify its fraction on the costs, we can’t conclude that.

So the code trying to reuse the builder would be more complicated, wasting memory, as it keeps the builder alive without knowing whether it ever will be actually reused, without a clear performance benefit.

Especially when we consider that additionally to the statement above

  • JVMs like HotSpot have Escape Analysis, which can elide pure local allocations like these altogether and also may elide the copying costs of array resize operations
  • Such sophisticated JVMs usually also have optimizations dedicated specifically to StringBuilder based concatenation, which work best when the compiled code follows the common pattern

With Java 9, the picture is going to change again. Then, string concatenation will get compiled to an invokedynamic instruction which will get linked to a JRE provided factory at runtime (see StringConcatFactory). Then, the JRE will decide how the code will look like, which allows to tailor it to the specific JVM, including buffer re-use, if it has a benefit on that particular JVM. This will also reduce the code size, as it requires only a single instruction rather than the sequence of an allocation and multiple calls into the StringBuilder.

Slifka answered 8/2, 2017 at 17:1 Comment(1)
with jdk-9 the pictures changes dramatically again :)Kef

© 2022 - 2024 — McMap. All rights reserved.