I want to correct and complement previous answers.
- Object.clone uses unchecked System.arraycopy implementation for arrays;
- The main performance improvement of Object.clone, it is initialization of RAW memory directly. In the case of System.arraycopy it also tries to combine array initialization with copy operation, as we can see in source code, but it also does different additional checks for this, unlike Object.clone. If you just disable this feature (see below), then performance would be very closer (in particularly on my hardware).
- One more interesting thing is about Young vs Old Gen. In case when source array aligned and inside Old Gen, both methods have close performance.
- When we copy primitive arrays System.arraycopy always uses generate_unchecked_arraycopy.
- It depends from hardware/OS dependent implementations, so don't trust benchmarks and assumptions, check on you own.
Explanation
First of all clone method and System.arraycopy are intrinsics.
Object.clone and System.arraycopy use generate_unchecked_arraycopy.
And if we go deeper we could see that after that HotSpot select concrete implementation, dependent from OS, etc.
Longly.
Let's see the code from Hotspot.
First of all we will see that Object.clone (LibraryCallKit::inline_native_clone) uses generate_arraycopy, which used for System.arraycopy in case of -XX:-ReduceInitialCardMarks. Otherwise it does LibraryCallKit::copy_to_clone, which initialize new array in RAW memory (if -XX:+ReduceBulkZeroing, which enabled by default).
In contrast System.arraycopy uses generate_arraycopy directly, try to check ReduceBulkZeroing (and many others cases) and eliminate array zeroing too, with mentioned additional checks and also it would make additional checks to make sure that all elements are initialized, unlike Object.clone. Finally, in best case both of them use generate_unchecked_arraycopy.
Below I show some benchmarks to see this effect on practice:
- First one is just simple benchmark, the only difference from previous answer, that arrays is not sorted; We see that arraycopy is slower (but not two times), results - https://pastebin.com/ny56Ag1z;
- Secondly, I add option -XX:-ReduceBulkZeroing and now I see that the performance of both methods is very closer. Results - https://pastebin.com/ZDAeQWwx;
- I also assume that we will have the difference between Old/Young, because of arrays alignment (it is a feature of Java GC, when we call GC, alignment of arrays is changed, it is easy to observe using JOL). I was surprised that performance become the same, generally, and downgrade for both methods. Results - https://pastebin.com/bTt5SJ8r. For whom who believes in concrete numbers, throughput of System.arraycopy is better then Object.clone.
First benchmark:
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CloneVsArraycopy {
@Param({"10", "1000", "100000"})
int size;
int[] source;
@Setup(Level.Invocation)
public void setup() {
source = create(size);
}
@Benchmark
public int[] clone(CloneVsArraycopy cloneVsArraycopy) {
return cloneVsArraycopy.source.clone();
}
@Benchmark
public int[] arraycopy(CloneVsArraycopy cloneVsArraycopy) {
int[] dest = new int[cloneVsArraycopy.size];
System.arraycopy(cloneVsArraycopy.source, 0, dest, 0, dest.length);
return dest;
}
public static void main(String[] args) throws Exception {
new Runner(new OptionsBuilder()
.include(CloneVsArraycopy.class.getSimpleName())
.warmupIterations(20)
.measurementIterations(20)
.forks(20)
.build()).run();
}
private static int[] create(int size) {
int[] a = new int[size];
for (int i = 0; i < a.length; i++) {
a[i] = ThreadLocalRandom.current().nextInt();
}
return a;
}
}
Running this test on my PC, I got this - https://pastebin.com/ny56Ag1z.
The difference is not so big, but still exists.
The second benchmark I only add one setting -XX:-ReduceBulkZeroing and got this results https://pastebin.com/ZDAeQWwx. No we see that for Young Gen the difference is much less too.
In third benchmark I changed only setup method and enable ReduceBulkZeroing option back:
@Setup(Level.Invocation)
public void setup() {
source = create(size);
// try to move to old gen/align array
for (int i = 0; i < 10; ++i) {
System.gc();
}
}
The difference is much less (maybe in error interval) - https://pastebin.com/bTt5SJ8r.
Disclaimer
It is also could be wrong. You should check on your own.
In addition
I think, it is interesting to look on benchmarks process:
# Benchmark: org.egorlitvinenko.arrays.CloneVsArraycopy.arraycopy
# Parameters: (size = 50000)
# Run progress: 0,00% complete, ETA 00:07:30
# Fork: 1 of 5
# Warmup Iteration 1: 8,870 ops/ms
# Warmup Iteration 2: 10,912 ops/ms
# Warmup Iteration 3: 16,417 ops/ms <- Hooray!
# Warmup Iteration 4: 17,924 ops/ms <- Hooray!
# Warmup Iteration 5: 17,321 ops/ms <- Hooray!
# Warmup Iteration 6: 16,628 ops/ms <- What!
# Warmup Iteration 7: 14,286 ops/ms <- No, stop, why!
# Warmup Iteration 8: 13,928 ops/ms <- Are you kidding me?
# Warmup Iteration 9: 13,337 ops/ms <- pff
# Warmup Iteration 10: 13,499 ops/ms
Iteration 1: 13,873 ops/ms
Iteration 2: 16,177 ops/ms
Iteration 3: 14,265 ops/ms
Iteration 4: 13,338 ops/ms
Iteration 5: 15,496 ops/ms
For Object.clone
# Benchmark: org.egorlitvinenko.arrays.CloneVsArraycopy.clone
# Parameters: (size = 50000)
# Run progress: 0,00% complete, ETA 00:03:45
# Fork: 1 of 5
# Warmup Iteration 1: 8,761 ops/ms
# Warmup Iteration 2: 12,673 ops/ms
# Warmup Iteration 3: 20,008 ops/ms
# Warmup Iteration 4: 20,340 ops/ms
# Warmup Iteration 5: 20,112 ops/ms
# Warmup Iteration 6: 20,061 ops/ms
# Warmup Iteration 7: 19,492 ops/ms
# Warmup Iteration 8: 18,862 ops/ms
# Warmup Iteration 9: 19,562 ops/ms
# Warmup Iteration 10: 18,786 ops/ms
We can observe perfomance downgrade here for System.arraycopy. I saw similar picture for Streams and there was a bug in compilers.
I suppose it could be a bug in compilers too. Anyway, it is strange that after 3 warmup performance downgrades.
UPDATE
What is about typechecking
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
@State(Scope.Benchmark)
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CloneVsArraycopyObject {
@Param({"100"})
int size;
AtomicLong[] source;
@Setup(Level.Invocation)
public void setup() {
source = create(size);
}
@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public AtomicLong[] clone(CloneVsArraycopyObject cloneVsArraycopy) {
return cloneVsArraycopy.source.clone();
}
@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public AtomicLong[] arraycopy(CloneVsArraycopyObject cloneVsArraycopy) {
AtomicLong[] dest = new AtomicLong[cloneVsArraycopy.size];
System.arraycopy(cloneVsArraycopy.source, 0, dest, 0, dest.length);
return dest;
}
public static void main(String[] args) throws Exception {
new Runner(new OptionsBuilder()
.include(CloneVsArraycopyObject.class.getSimpleName())
.jvmArgs("-XX:+UnlockDiagnosticVMOptions", "-XX:+PrintInlining", "-XX:-ReduceBulkZeroing")
.warmupIterations(10)
.measurementIterations(5)
.forks(5)
.build())
.run();
}
private static AtomicLong[] create(int size) {
AtomicLong[] a = new AtomicLong[size];
for (int i = 0; i < a.length; i++) {
a[i] = new AtomicLong(ThreadLocalRandom.current().nextLong());
}
return a;
}
}
Difference is not observed - https://pastebin.com/ufxCZVaC.
I suppose an explanation is simple, as System.arraycopy is hot intrinsic in that case, the real implementation would be just inlined without any typecheking, etc.
Note
I agreed with Radiodef you could find interesting to read blog post, the author of this blog is the creator (or one of creators) of JMH.
clone
andArrays.copyOf
will be faster thanSystem.arraycopy
, if you're creating and filling a new array because the former 2 methods can avoid the implicit zero-initialization when creating an array withnew
. Although not specifically about this particular problem, this blog post has a lot of related information. I'm pretty sure we have Q&As here on SO which cover this but I'm having trouble finding one. – BronArrays.copyOf
vsSystem.arrayCopy
are discussed, while I am asking about array'sclone()
method – Germanizeclone()
method which doesn't reveal why it is faster – Germanize