Would Stream.toList() perform better than Collectors.toList()

Asked 15/1, 2021 at 18:35 Answered 31/1, 2021 at 16:23

Solved java performance java-stream jmh java-16

JDK is introducing an API Stream.toList() with JDK-8180352. Here is a benchmarking code that I have attempted to compare its performance with the existing Collectors.toList:

@BenchmarkMode(Mode.All)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 20, time = 1, batchSize = 10000)
@Measurement(iterations = 20, time = 1, batchSize = 10000)
public class CollectorsVsStreamToList {

    @Benchmark
    public List<Integer> viaCollectors() {
        return IntStream.range(1, 1000).boxed().collect(Collectors.toList());
    }

    @Benchmark
    public List<Integer> viaStream() {
        return IntStream.range(1, 1000).boxed().toList();
    }
}

The result summary is as follows:

Benchmark                                                       Mode  Cnt   Score    Error  Units
CollectorsVsStreamToList.viaCollectors                         thrpt   20  17.321 ±  0.583  ops/s
CollectorsVsStreamToList.viaStream                             thrpt   20  23.879 ±  1.682  ops/s
CollectorsVsStreamToList.viaCollectors                          avgt   20   0.057 ±  0.002   s/op
CollectorsVsStreamToList.viaStream                              avgt   20   0.040 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors                        sample  380   0.054 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.00    sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.50    sample        0.054            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.90    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.95    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.99    sample        0.062            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.999   sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.9999  sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p1.00    sample        0.068            s/op
CollectorsVsStreamToList.viaStream                            sample  525   0.039 ±  0.001   s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.00            sample        0.037            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.50            sample        0.038            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.90            sample        0.040            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.95            sample        0.042            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.99            sample        0.050            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.999           sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.9999          sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p1.00            sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors                            ss   20   0.060 ±  0.007   s/op
CollectorsVsStreamToList.viaStream                                ss   20   0.043 ±  0.006   s/op

Of course, the first question to the domain experts would be if the benchmarking procedure is correct or not? The test class was executed on MacOS. Do let me know for any further details required.

Follow-up, as far as I could infer from the readings the average time, throughput, and sampling time of the Stream.toList looks better than the Collectors.toList. Is that understanding correct?

Xuanxunit answered 15/1, 2021 at 18:35 Comment(16)

Side note: Based on the implementation it looks like what you're really testing is the difference between toArray() and collect(Collectors.toList()). But that's of course only one implementation. – Didi 15/1, 2021 at 18:59

It is quite possible that Stream::toList is more efficient in some cases -- but it really depends on the details. Stream::toList builds on toArray, and for sources that have the SIZED (and ideally SUBSIZED, for parallel streams) characteristics, toArray is optimized to reduce reallocation and copying compared to collect. – Naca 15/1, 2021 at 20:30

If you're trying to measure the difference between the two collectors, though, you should minimize the work of the rest of the stream pipline. I would move the source generation, and the boxing, out of the benchmark method and into an @State variable initialized outside the benchmark method, with something like Stream.of(data).toList(). The boxing is surely distorting your data. I'd also include parallel runs. – Naca 15/1, 2021 at 20:33

@BrianGoetz did you mean to initialiise the source data into a static attribute and then simply using the stream under the benchmark something like

static final Set<Integer> data = IntStream.range(0, 1000).boxed().collect(Collectors.toSet());      @Benchmark     public List<Integer> viaCollectors() { return data.stream().collect(Collectors.toList()); }      @Benchmark     public List<Integer> viaCollectorsParallel() { return data.stream().parallel().collect(Collectors.toList()); }

? Not really sure of how to do that with @State. If so, I can update with those results as well. – Xuanxunit 16/1, 2021 at 3:53

@Xuanxunit There's a few ways to do it, but here's one: tutorials.jenkov.com/java-performance/jmh.html The main thing is that you want to move as much setup as possible out of the benchmark methods. I'd probably go with Stream.of(array).collect(...) in the benchmark method, rather than a Set, since arrays will give you a better-behaving Spliterator than Set. – Naca 16/1, 2021 at 14:5

@BrianGoetz Thank you for sharing. I have tried to rephrase my tests based on my understanding and evaluated the results to be similar for non-parallel streams as shared in the question. For the .parallel() streams, the Collectors.toList() seems to be slightly better than the Stream.toList() under the results I could generate. I have refrained to pull in the complete result output here, but let me know if any of it(comments after code) makes sense to be a part of the question. – Xuanxunit 17/1, 2021 at 17:34

@BrianGoetz is there any reason why the code and documentation use Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray()))) instead of Collections.unmodifiableList(Arrays.asList(this.toArray()))? This additional copying step seems to serve no purpose. – Corrianne 18/1, 2021 at 12:23

@Corrianne not really much, it's just that looking at the JDK code, I wanted to confirm the same behavior as your previous comment. But for that, I had performed the benchmarks on two completely different APIs. – Xuanxunit 18/1, 2021 at 15:33

@Xuanxunit I don’t think that the actual Stream implementation provided by the JDK will end up at the default method, but rather have it overridden with a smarter implementation. Still, it’s annoying to have such an apparently unnecessary copying step even in documentation of the method. – Corrianne 18/1, 2021 at 16:10

@Corrianne You're looking at the wrong code. You're seeing the default implementation, which only comes into play with third-party implementations of streams. The actual implementation is in ReferencePipeline; that's the one that is actually being used. That said, it does seem the default code is doing an extra step. – Naca 18/1, 2021 at 19:28

@Corrianne The extra step in the default implementation is there to force a defensive copy if this.toArray were to violate its spec and keep a reference to the returned array. Without the defensive copy, it would be possible to modify the list returned from the default toList implementation. – Stulin 18/1, 2021 at 21:35

@StuartMarks toArray is part of the Stream API, so if an implementation violates the spec and returns a shared array, callers of the toArray method could already break. Why should callers of the toList() method get more guarantees than callers of toArray()? It’s very simple: the methods do what the spec says, if the implementation respects the spec. It’s not the task of a default implementation to fix a potentially broken implementation. Defensive copies are good for essential classes like String, but not for a wrapper that never guaranteed to be a truly immutable collection anyway. – Corrianne 19/1, 2021 at 7:57

@Corrianne We disagree, but I'm not going to argue this out here. – Stulin 19/1, 2021 at 16:39

@StuartMarks ok, this is not the place. Just one last comment, there are lots of places, where the result of JDK code depends on the correctness of an unknown implementation of an interface and having such a distrust/expensive defensive coding at this place looks arbitrary. And well, in the documentation, it definitely is unnecessary. – Corrianne 20/1, 2021 at 7:28

@StuartMarks In regards to your comment: The extra step in the default implementation is there to force a defensive copy if this.toArray were to violate its spec and keep a reference to the returned array. - I cannot find this spec anywhere. The javadoc doesn't mention it at all. Contrast to the javadoc of Collection::toArray which explicitly mentions it must be 'safe'. Where I should look? – Dysuria 13/12, 2023 at 17:46

@Dysuria I probably mistakenly assumed that the Stream::toArray spec had the same "safety" requirements as Collection::toArray. That was our intent, as I recall, but it didn't make it into the spec. – Stulin 15/12, 2023 at 18:30

Stream::toList is built atop toArray, not collect. There are a number of optimizations in toArray that make it potentially faster than collecting, though this depends heavily on the details. If the stream pipeline (from source through final intermediate operation) is SIZED, the target array can be presized (rather than potentially reallocating as the toList collector must do.) If the pipeline is further SUBSIZED, then parallel executions can not only presize the result array, but can compute exact per-shard offsets so each sub-task can drop its results in exactly the right place, eliminating the need for copying intermediate results into the final result.

So, depending on the details, toList may well be considerably faster than collect.

Naca answered 31/1, 2021 at 16:23 Comment(1)

Even for unsized streams, the internally used spined buffer may be more efficient than filling an ArrayList like Collectors.toList() does. – Corrianne 1/2, 2021 at 13:57

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags