Is it really different?
Let's start by analyzing javac output. Given the code:
public class Main {
public String appendInline() {
final StringBuilder sb = new StringBuilder().append("some").append(' ').append("string");
return sb.toString();
}
public String appendPerLine() {
final StringBuilder sb = new StringBuilder();
sb.append("some");
sb.append(' ');
sb.append("string");
return sb.toString();
}
}
We compile with javac, and check the output with javap -c -s
public java.lang.String appendInline();
descriptor: ()Ljava/lang/String;
Code:
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: ldc #4 // String some
9: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
12: bipush 32
14: invokevirtual #6 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
17: ldc #7 // String string
19: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
22: astore_1
23: aload_1
24: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
27: areturn
public java.lang.String appendPerLine();
descriptor: ()Ljava/lang/String;
Code:
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: astore_1
8: aload_1
9: ldc #4 // String some
11: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
14: pop
15: aload_1
16: bipush 32
18: invokevirtual #6 // Method java/lang/StringBuilder.append:(C)Ljava/lang/StringBuilder;
21: pop
22: aload_1
23: ldc #7 // String string
25: invokevirtual #5 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
28: pop
29: aload_1
30: invokevirtual #8 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
33: areturn
As seen, the appendPerLine
variant produces a much larger bytecode, by producing several extra aload_1
and pop
instructions that basically cancel each other out (leaving the string builder / buffer in he stack, and removing it to discard it). In turn, this means the JRE will produce a larger callsite and has a greater overhead. On the contrary, a smaller callsite improves the chances the JVM will inline the method calls, reducing method call overhead and further improving performance.
This alone improves the performance from a cold start when chaining method calls.
Shouldn't the JVM optimize this away?
One could argue that the JRE should be able to optimize these instructions away once the VM has warmed up. However, this claim needs support, and would still only apply to long-running processes.
So, let's check this claim, and validate the performance even after warmup. Let's use JMH to benchmark this behavior:
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
@State(Scope.Benchmark)
public class StringBenchmark {
private String from = "Alex";
private String to = "Readers";
private String subject = "Benchmarking with JMH";
@Param({"16"})
private int size;
@Benchmark
public String testEmailBuilderSimple() {
StringBuilder builder = new StringBuilder(size);
builder.append("From");
builder.append(from);
builder.append("To");
builder.append(to);
builder.append("Subject");
builder.append(subject);
return builder.toString();
}
@Benchmark
public String testEmailBufferSimple() {
StringBuffer buffer = new StringBuffer(size);
buffer.append("From");
buffer.append(from);
buffer.append("To");
buffer.append(to);
buffer.append("Subject");
buffer.append(subject);
return buffer.toString();
}
@Benchmark
public String testEmailBuilderChain() {
return new StringBuilder(size).append("From").append(from).append("To").append(to).append("Subject")
.append(subject).toString();
}
@Benchmark
public String testEmailBufferChain() {
return new StringBuffer(size).append("From").append(from).append("To").append(to).append("Subject")
.append(subject).toString();
}
}
We compile and run it and we obtain:
Benchmark (size) Mode Cnt Score Error Units
StringBenchmark.testEmailBufferChain 16 thrpt 200 22981842.957 ± 238502.907 ops/s
StringBenchmark.testEmailBufferSimple 16 thrpt 200 5789967.103 ± 62743.660 ops/s
StringBenchmark.testEmailBuilderChain 16 thrpt 200 22984472.260 ± 212243.175 ops/s
StringBenchmark.testEmailBuilderSimple 16 thrpt 200 5778824.788 ± 59200.312 ops/s
So, even after warming up, following the rule produces a ~4X improvement in throughput. All these runs were done using Oracle JRE 8u121.
Of course, you don't have to believe me, others have done similar analysis and you can even try it yourself.
Does it even matter?
Well, it depends. This is certainly a micro-optimization. If a system is using Bubble Sort, there are certainly more pressing performance issues than this. Not all programs have the same requirements and therefore not all need to follow the same rules.
This PMD rule is probably meaningful only to specific projects that value performance greatly, and will do whatever it takes to shave a couple ms. Such projects would normally use several different profilers, microbenchmarks, and other tools. And having tools such as PMD keeping an eye on specific patterns will certainly help them.
PMD has many other rules available, that will probably apply to many other projects. Just because this particular rule may not apply to your project doesn't mean the tool is not useful, just take your time to review the available rules and choose those that really matter to you.
Hope that clears it up for everyone.
javap
to see the difference. Whilst there is a difference in the byte code, this feels like unnecessary micro-optimization, which the JIT would likely deal with for you. – Hutner