JDK is introducing an API Stream.toList()
with JDK-8180352. Here is a benchmarking code that I have attempted to compare its performance with the existing Collectors.toList
:
@BenchmarkMode(Mode.All)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 20, time = 1, batchSize = 10000)
@Measurement(iterations = 20, time = 1, batchSize = 10000)
public class CollectorsVsStreamToList {
@Benchmark
public List<Integer> viaCollectors() {
return IntStream.range(1, 1000).boxed().collect(Collectors.toList());
}
@Benchmark
public List<Integer> viaStream() {
return IntStream.range(1, 1000).boxed().toList();
}
}
The result summary is as follows:
Benchmark Mode Cnt Score Error Units
CollectorsVsStreamToList.viaCollectors thrpt 20 17.321 ± 0.583 ops/s
CollectorsVsStreamToList.viaStream thrpt 20 23.879 ± 1.682 ops/s
CollectorsVsStreamToList.viaCollectors avgt 20 0.057 ± 0.002 s/op
CollectorsVsStreamToList.viaStream avgt 20 0.040 ± 0.001 s/op
CollectorsVsStreamToList.viaCollectors sample 380 0.054 ± 0.001 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.00 sample 0.051 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.50 sample 0.054 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.90 sample 0.058 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.95 sample 0.058 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.99 sample 0.062 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.999 sample 0.068 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.9999 sample 0.068 s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p1.00 sample 0.068 s/op
CollectorsVsStreamToList.viaStream sample 525 0.039 ± 0.001 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.00 sample 0.037 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.50 sample 0.038 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.90 sample 0.040 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.95 sample 0.042 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.99 sample 0.050 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.999 sample 0.051 s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.9999 sample 0.051 s/op
CollectorsVsStreamToList.viaStream:viaStream·p1.00 sample 0.051 s/op
CollectorsVsStreamToList.viaCollectors ss 20 0.060 ± 0.007 s/op
CollectorsVsStreamToList.viaStream ss 20 0.043 ± 0.006 s/op
Of course, the first question to the domain experts would be if the benchmarking procedure is correct or not? The test class was executed on MacOS. Do let me know for any further details required.
Follow-up, as far as I could infer from the readings the average time, throughput, and sampling time of the Stream.toList
looks better than the Collectors.toList
. Is that understanding correct?
toArray()
andcollect(Collectors.toList())
. But that's of course only one implementation. – DidiStream::toList
is more efficient in some cases -- but it really depends on the details.Stream::toList
builds ontoArray
, and for sources that have the SIZED (and ideally SUBSIZED, for parallel streams) characteristics,toArray
is optimized to reduce reallocation and copying compared tocollect
. – Naca@State
variable initialized outside the benchmark method, with something likeStream.of(data).toList()
. The boxing is surely distorting your data. I'd also include parallel runs. – Nacastream
under the benchmark something likestatic final Set<Integer> data = IntStream.range(0, 1000).boxed().collect(Collectors.toSet()); @Benchmark public List<Integer> viaCollectors() { return data.stream().collect(Collectors.toList()); } @Benchmark public List<Integer> viaCollectorsParallel() { return data.stream().parallel().collect(Collectors.toList()); }
? Not really sure of how to do that with@State
. If so, I can update with those results as well. – XuanxunitStream.of(array).collect(...)
in the benchmark method, rather than a Set, since arrays will give you a better-behavingSpliterator
thanSet
. – Naca.parallel()
streams, theCollectors.toList()
seems to be slightly better than theStream.toList()
under the results I could generate. I have refrained to pull in the complete result output here, but let me know if any of it(comments after code) makes sense to be a part of the question. – XuanxunitCollections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray())))
instead ofCollections.unmodifiableList(Arrays.asList(this.toArray()))
? This additional copying step seems to serve no purpose. – CorrianneReferencePipeline
; that's the one that is actually being used. That said, it does seem the default code is doing an extra step. – Nacathis.toArray
were to violate its spec and keep a reference to the returned array. Without the defensive copy, it would be possible to modify the list returned from the defaulttoList
implementation. – StulintoArray
is part of the Stream API, so if an implementation violates the spec and returns a shared array, callers of thetoArray
method could already break. Why should callers of thetoList()
method get more guarantees than callers oftoArray()
? It’s very simple: the methods do what the spec says, if the implementation respects the spec. It’s not the task of a default implementation to fix a potentially broken implementation. Defensive copies are good for essential classes likeString
, but not for a wrapper that never guaranteed to be a truly immutable collection anyway. – CorrianneCollection::toArray
which explicitly mentions it must be 'safe'. Where I should look? – Dysuria