Java streams lazy vs fusion vs short-circuiting
Asked Answered
P

1

6

I'm trying to form a cocise and conherent understanding of the application of lazy evaluation within the Java streams API.

Here is what I currently understand:

  • elements are only consumed as they are needed, i.e. streams are lazy, and intermediate operations are lazy such that e.g. filter, will only filter when it is required to.
  • intermediate operations may be fused together (if they are stateless).
  • short-circuiting operations do not need to process the entire stream.

What I want to do is bring all these ideas together and ensure I'm not misrepresenting anything. I'm finding it tricky because whenever I read any literature on Java streams, it goes on to say they're lazy or utilise lazy evaluation, and then very much interchangeably starts talking about optimisations such as fusion and short-circuiting.

So would I be right in saying the following?

  • fusion is how lazy evaluation has been implemented in the stream API - i.e. an element is consumed, and operations are fused together wherever possible. I'm thinking that if fusion didn't exist then surely we'd be back to eager evaluation as the alternative would just be to process all elements for each intermediate operation before moving onto the next?

  • short-circuiting is possible without fusion or lazy evaluation but is very much helped in the context of streams by these the implementation of these two principles?

I'd appreciate any further insight and clarity on this.

Pesach answered 2/2, 2016 at 10:0 Comment(0)
W
53

As for fusion. Let's imagine here's a map operation:

.map(x -> x.squash())

Map

It's stateless and it just transforms any input according to the specified algorithm (in our case squashes them). Now the filter operation:

.filter(x -> x.getColor() != YELLOW)

Filter

It's also stateless and it just removes some elements (in our case yellow ones). Now let's have a terminal operation:

.forEach(System.out::println)

Display

It just displays the input elements to the terminal. The fusion means that all intermediate stateless operations are merged with terminal consumer into single operation:

.map(x -> x.squash())
.filter(x -> x.getColor() != YELLOW)
.forEach(System.out::println)

Fuse

The whole pipeline is fused into single Consumer which is connected directly to the source. When every single element is processed, the source spliterator just executes the combined consumer, the stream pipeline does not intercept anything and does not perform any additional bookkeeping. That's fusion. Fusion does not depend on short-circuiting. It's possible to implement streams without fusion (execute one operation, take the result, execute the next operation, taking the control after each operation back to the stream engine). It's also possible to have fusion without short-circuiting.

Wendish answered 2/2, 2016 at 15:22 Comment(3)
@Tagir Valeev Thanks for the very visual answer - most helpful. I don't feel you answered a couple of my original questions however, which was: "is fusion is how lazy evaluation has been implemented in the stream API - I'm thinking that if fusion didn't exist then surely we'd be back to eager evaluation" and furthermore "is short-circuiting possible without fusion?" and if so would it just be an eager evaluation version, that only stops short on the terminal operation and not on processing intermeidate operations.Pesach
@Tranquility, I just explained that fusion is a way of merging several ops together. You can perform them step-by-step without fusion and still have lazy evaluation.Wendish
@Tagir Valeev When you say step-by-step do you still mean processing a single element at a time, i.e. consuming elements as they are needed, but just having to run the intermediate and terminal operations on the element separately rather than all as one fused (and optimised) operation? I was thinking that fusing was the thing that allowed an element to be consumed independently, without having to process other elements (eagerly) but it seems you're suggesting otherwise.Pesach

© 2022 - 2024 — McMap. All rights reserved.