Iterator versus Stream of Java 8
Asked Answered
R

2

69

To take advantage of the wide range of query methods included in java.util.stream of Jdk 8 I am attempted to design domain models where getters of relationship with * multiplicity (with zero or more instances ) return a Stream<T>, instead of an Iterable<T> or Iterator<T>.

My doubt is if there is any additional overhead incurred by the Stream<T> in comparison to the Iterator<T>?

So, is there any disadvantage of compromising my domain model with a Stream<T>?

Or instead, should I always return an Iterator<T> or Iterable<T>, and leave to the end-user the decision of choosing whether to use a stream, or not, by converting that iterator with the StreamUtils?

Note that returning a Collection is not a valid option because in this case most of the relationships are lazy and with unknown size.

Rhesus answered 3/7, 2015 at 16:9 Comment(11)
I would return a Collection or a Map or similar and let the client code to decide what to do with it.Ranger
Relevant: Should I return a Collection or a Stream?Hames
I suspect Stream is a little slower; not only because of its internals, but also because how it's used at callsite. But I doubt the performance difference would matter much. If you care about CPU cycles at such level, Iterator is very horrible too.Ferd
To return a Collection I would have to make a copy of the source, because I want to avoid external modifications. Doing that copy has a huge overhead that I want to avoid. So, I prefer to limit external actions over the source and return an immutable view through an Iterator<T>, Iterable<T> or a Stream<T>. I will update my question.Rhesus
I suppose you will return copies for the single objects too even if you use a Stream.Characterization
You can return a read only view of a List. List has more functionalities, like size(), and random access, which might be important to your users.Ferd
A read only view of a List incurs in further indirections for all invocations.Rhesus
not necessarily. if your internal model is an array, a readonly List wrapper would be as fast as you can get.Ferd
But it is not the case and most of the relationships are Lazy and with unknown size.Rhesus
List, of course, has a lot of mutation methods that you don't like. Regretfully, Java does not have a readonly list-like interface. You might want to design one in your library; and it can be bridged to both Iterable and Stream.Ferd
Ok, if size is unknown, Collection is out of the question. Stream it is...Ferd
H
89

There's lots of performance advice here, but sadly much of it is guesswork, and little of it points to the real performance considerations.

@Holger gets it right by pointing out that we should resist the seemingly overwhelming tendency to let the performance tail wag the API design dog.

While there are a zillion considerations that can make a stream slower than, the same as, or faster than some other form of traversal in any given case, there are some factors that point to streams haven a performance advantage where it counts -- on big data sets.

There is some additional fixed startup overhead of creating a Stream compared to creating an Iterator -- a few more objects before you start calculating. If your data set is large, it doesn't matter; it's a small startup cost amortized over a lot of computation. (And if your data set is small, it probably also doesn't matter -- because if your program is operating on small data sets, performance is generally not your #1 concern either.) Where this does matter is when going parallel; any time spent setting up the pipeline goes into the serial fraction of Amdahl's law; if you look at the implementation, we work hard to keep the object count down during stream setup, but I'd be happy to find ways to reduce it as that has a direct effect on the breakeven data set size where parallel starts to win over sequential.

But, more important than the fixed startup cost is the per-element access cost. Here, streams actually win -- and often win big -- which some may find surprising. (In our performance tests, we routinely see stream pipelines which can outperform their for-loop over Collection counterparts.) And, there's a simple explanation for this: Spliterator has fundamentally lower per-element access costs than Iterator, even sequentially. There are several reasons for this.

  1. The Iterator protocol is fundamentally less efficient. It requires calling two methods to get each element. Further, because Iterators must be robust to things like calling next() without hasNext(), or hasNext() multiple times without next(), both of these methods generally have to do some defensive coding (and generally more statefulness and branching), which adds to inefficiency. On the other hand, even the slow way to traverse a spliterator (tryAdvance) doesn't have this burden. (It's even worse for concurrent data structures, because the next/hasNext duality is fundamentally racy, and Iterator implementations have to do more work to defend against concurrent modifications than do Spliterator implementations.)

  2. Spliterator further offers a "fast-path" iteration -- forEachRemaining -- which can be used most of the time (reduction, forEach), further reducing the overhead of the iteration code that mediates access to the data structure internals. This also tends to inline very well, which in turn increases the effectiveness of other optimizations such as code motion, bounds check elimination, etc.

  3. Further, traversal via Spliterator tend to have many fewer heap writes than with Iterator. With Iterator, every element causes one or more heap writes (unless the Iterator can be scalarized via escape analysis and its fields hoisted into registers.) Among other issues, this causes GC card mark activity, leading to cache line contention for the card marks. On the other hand, Spliterators tend to have less state, and industrial-strength forEachRemaining implementations tend to defer writing anything to the heap until the end of the traversal, instead storing its iteration state in locals which naturally map to registers, resulting in reduced memory bus activity.

Summary: don't worry, be happy. Spliterator is a better Iterator, even without parallelism. (They're also generally just easier to write and harder to get wrong.)

Heckle answered 3/7, 2015 at 18:47 Comment(4)
I think, using Stream API with additional lambda-involving operations like filter or map might be slower compared to pre-java-8 for(Obj obj : collection) {...} loop, because functional interfaces used which may have a lot of implementations, thus megamorphic lookup should be done for every filtering/mapping function call. Of course iterator.hasNext/iterator.next calls also can be megamorphic, but stream API may include more steps including lambdas created internally which also implement the same functional interfaces.Leveloff
@TagirValeev Sounds like guessing? Anyway, doesn't sound right to me. Once we inline through the terminal operation, all the call sites become monomorphic again (due to type sharpening and the class-per-lambda translation scheme.)Heckle
My guessing is supported by the fact the +PrintInlining always says java.util.stream.AbstractPipeline::evaluate (94 bytes) callee is too large, thus it seems that practically almost any terminal operation cannot inline into caller method.Leveloff
Here's a simple benchmark which demonstrates that previous filters in setup slow down the simple stream operation by 15-35% depending on JVM version and task size. Note that the spliterator is monomorphic here (ArraySpliterator). Thus I think that megamorphic lookup acctually matters here. Do you have another explanation for there results?Leveloff
M
23

Let’s compare the common operation of iterating over all elements, assuming that the source is an ArrayList. Then, there are three standard ways to achieve this:

  • Collection.forEach

    final E[] elementData = (E[]) this.elementData;
    final int size = this.size;
    for (int i=0; modCount == expectedModCount && i < size; i++) {
        action.accept(elementData[i]);
    }
    
  • Iterator.forEachRemaining

    final Object[] elementData = ArrayList.this.elementData;
    if (i >= elementData.length) {
        throw new ConcurrentModificationException();
    }
    while (i != size && modCount == expectedModCount) {
        consumer.accept((E) elementData[i++]);
    }
    
  • Stream.forEach which will end up calling Spliterator.forEachRemaining

    if ((i = index) >= 0 && (index = hi) <= a.length) {
       for (; i < hi; ++i) {
           @SuppressWarnings("unchecked") E e = (E) a[i];
           action.accept(e);
       }
       if (lst.modCount == mc)
           return;
    }
    

As you can see, the inner loop of the implementation code, where these operations end up, is basically the same, iterating over indices and directly reading the array and passing the element to the Consumer.

Similar things apply to all standard collections of the JRE, all of them have adapted implementations for all ways to do it, even if you are using a read-only wrapper. In the latter case, the Stream API would even slightly win, Collection.forEach has to be called on the read-only view in order to delegate to the original collection’s forEach. Similarly, the iterator has to be wrapped to protect against attempts to invoke the remove() method. In contrast, spliterator() can directly return the original collection’s Spliterator as it has no modification support. Thus, the stream of a read-only view is exactly the same as the stream of the original collection.

Though all these differences are hardly to notice when measuring real life performance as, as said, the inner loop, which is the most performance relevant thing, is the same in all cases.

The question is which conclusion to draw from that. You still can return a read-only wrapper view to the original collection, as the caller still may invoke stream().forEach(…) to directly iterate in the context of the original collection.

Since the performance isn’t really different, you should rather focus on the higher level design like discussed in “Should I return a Collection or a Stream?”

Murrell answered 3/7, 2015 at 16:51 Comment(4)
calling trysplit() on collection doesnt give parallel processing, we then again use 2 spliterators to process sequentially, so how does spliterator enable parallel processing when it comes to collection ?Ammonite
@amarnathharish this is completely out of this question’s scope, but well, when you have two spliterators after calling trySplit, you process each of them sequentially, but each of them in a different thread, using two CPU cores in the best case.Murrell
one more, so if we have to do some coding to take care of how it joins(like a combiner ) after iteratingAmmonite
@amarnathharish I’m not sure whether your last comment bears a question. For particular tasks, an expensive combiner function may render parallel processing less efficient than sequential execution. Note that for operations like collect(toList()) or collect(joining()), the costs of merging are roughly on par with the gain from parallel processing (in the best case), so you will rarely see any advantage of someCollection.parallelStream().collect(toList()) with the current implementation. So you’d need additional significant intermediate operations to benefit from parallel in this example.Murrell

© 2022 - 2024 — McMap. All rights reserved.