Would Stream.toList() perform better than Collectors.toList()
Asked Answered
X

1

22

JDK is introducing an API Stream.toList() with JDK-8180352. Here is a benchmarking code that I have attempted to compare its performance with the existing Collectors.toList:

@BenchmarkMode(Mode.All)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 20, time = 1, batchSize = 10000)
@Measurement(iterations = 20, time = 1, batchSize = 10000)
public class CollectorsVsStreamToList {

    @Benchmark
    public List<Integer> viaCollectors() {
        return IntStream.range(1, 1000).boxed().collect(Collectors.toList());
    }

    @Benchmark
    public List<Integer> viaStream() {
        return IntStream.range(1, 1000).boxed().toList();
    }
}

The result summary is as follows:

Benchmark                                                       Mode  Cnt   Score    Error  Units
CollectorsVsStreamToList.viaCollectors                         thrpt   20  17.321 ±  0.583  ops/s
CollectorsVsStreamToList.viaStream                             thrpt   20  23.879 ±  1.682  ops/s
CollectorsVsStreamToList.viaCollectors                          avgt   20   0.057 ±  0.002   s/op
CollectorsVsStreamToList.viaStream                              avgt   20   0.040 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors                        sample  380   0.054 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.00    sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.50    sample        0.054            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.90    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.95    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.99    sample        0.062            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.999   sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.9999  sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p1.00    sample        0.068            s/op
CollectorsVsStreamToList.viaStream                            sample  525   0.039 ±  0.001   s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.00            sample        0.037            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.50            sample        0.038            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.90            sample        0.040            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.95            sample        0.042            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.99            sample        0.050            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.999           sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.9999          sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p1.00            sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors                            ss   20   0.060 ±  0.007   s/op
CollectorsVsStreamToList.viaStream                                ss   20   0.043 ±  0.006   s/op

Of course, the first question to the domain experts would be if the benchmarking procedure is correct or not? The test class was executed on MacOS. Do let me know for any further details required.

Follow-up, as far as I could infer from the readings the average time, throughput, and sampling time of the Stream.toList looks better than the Collectors.toList. Is that understanding correct?

Xuanxunit answered 15/1, 2021 at 18:35 Comment(16)
Side note: Based on the implementation it looks like what you're really testing is the difference between toArray() and collect(Collectors.toList()). But that's of course only one implementation.Didi
It is quite possible that Stream::toList is more efficient in some cases -- but it really depends on the details. Stream::toList builds on toArray, and for sources that have the SIZED (and ideally SUBSIZED, for parallel streams) characteristics, toArray is optimized to reduce reallocation and copying compared to collect.Naca
If you're trying to measure the difference between the two collectors, though, you should minimize the work of the rest of the stream pipline. I would move the source generation, and the boxing, out of the benchmark method and into an @State variable initialized outside the benchmark method, with something like Stream.of(data).toList(). The boxing is surely distorting your data. I'd also include parallel runs.Naca
@BrianGoetz did you mean to initialiise the source data into a static attribute and then simply using the stream under the benchmark something like static final Set<Integer> data = IntStream.range(0, 1000).boxed().collect(Collectors.toSet()); @Benchmark public List<Integer> viaCollectors() { return data.stream().collect(Collectors.toList()); } @Benchmark public List<Integer> viaCollectorsParallel() { return data.stream().parallel().collect(Collectors.toList()); } ? Not really sure of how to do that with @State. If so, I can update with those results as well.Xuanxunit
@Xuanxunit There's a few ways to do it, but here's one: tutorials.jenkov.com/java-performance/jmh.html The main thing is that you want to move as much setup as possible out of the benchmark methods. I'd probably go with Stream.of(array).collect(...) in the benchmark method, rather than a Set, since arrays will give you a better-behaving Spliterator than Set.Naca
@BrianGoetz Thank you for sharing. I have tried to rephrase my tests based on my understanding and evaluated the results to be similar for non-parallel streams as shared in the question. For the .parallel() streams, the Collectors.toList() seems to be slightly better than the Stream.toList() under the results I could generate. I have refrained to pull in the complete result output here, but let me know if any of it(comments after code) makes sense to be a part of the question.Xuanxunit
@BrianGoetz is there any reason why the code and documentation use Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray()))) instead of Collections.unmodifiableList(Arrays.asList(this.toArray()))? This additional copying step seems to serve no purpose.Corrianne
@Corrianne not really much, it's just that looking at the JDK code, I wanted to confirm the same behavior as your previous comment. But for that, I had performed the benchmarks on two completely different APIs.Xuanxunit
@Xuanxunit I don’t think that the actual Stream implementation provided by the JDK will end up at the default method, but rather have it overridden with a smarter implementation. Still, it’s annoying to have such an apparently unnecessary copying step even in documentation of the method.Corrianne
@Corrianne You're looking at the wrong code. You're seeing the default implementation, which only comes into play with third-party implementations of streams. The actual implementation is in ReferencePipeline; that's the one that is actually being used. That said, it does seem the default code is doing an extra step.Naca
@Corrianne The extra step in the default implementation is there to force a defensive copy if this.toArray were to violate its spec and keep a reference to the returned array. Without the defensive copy, it would be possible to modify the list returned from the default toList implementation.Stulin
@StuartMarks toArray is part of the Stream API, so if an implementation violates the spec and returns a shared array, callers of the toArray method could already break. Why should callers of the toList() method get more guarantees than callers of toArray()? It’s very simple: the methods do what the spec says, if the implementation respects the spec. It’s not the task of a default implementation to fix a potentially broken implementation. Defensive copies are good for essential classes like String, but not for a wrapper that never guaranteed to be a truly immutable collection anyway.Corrianne
@Corrianne We disagree, but I'm not going to argue this out here.Stulin
@StuartMarks ok, this is not the place. Just one last comment, there are lots of places, where the result of JDK code depends on the correctness of an unknown implementation of an interface and having such a distrust/expensive defensive coding at this place looks arbitrary. And well, in the documentation, it definitely is unnecessary.Corrianne
@StuartMarks In regards to your comment: The extra step in the default implementation is there to force a defensive copy if this.toArray were to violate its spec and keep a reference to the returned array. - I cannot find this spec anywhere. The javadoc doesn't mention it at all. Contrast to the javadoc of Collection::toArray which explicitly mentions it must be 'safe'. Where I should look?Dysuria
@Dysuria I probably mistakenly assumed that the Stream::toArray spec had the same "safety" requirements as Collection::toArray. That was our intent, as I recall, but it didn't make it into the spec.Stulin
N
28

Stream::toList is built atop toArray, not collect. There are a number of optimizations in toArray that make it potentially faster than collecting, though this depends heavily on the details. If the stream pipeline (from source through final intermediate operation) is SIZED, the target array can be presized (rather than potentially reallocating as the toList collector must do.) If the pipeline is further SUBSIZED, then parallel executions can not only presize the result array, but can compute exact per-shard offsets so each sub-task can drop its results in exactly the right place, eliminating the need for copying intermediate results into the final result.

So, depending on the details, toList may well be considerably faster than collect.

Naca answered 31/1, 2021 at 16:23 Comment(1)
Even for unsized streams, the internally used spined buffer may be more efficient than filling an ArrayList like Collectors.toList() does.Corrianne

© 2022 - 2024 — McMap. All rights reserved.