I have read a lot about Java 8 streams lately, and several articles about lazy loading with Java 8 streams specifically: here and over here. I can't seem to shake the feeling that lazy loading is COMPLETELY useless (or at best, a minor syntactic convenience offering zero performance value).
Let's take this code as an example:
int[] myInts = new int[]{1,2,3,5,8,13,21};
IntStream myIntStream = IntStream.of(myInts);
int[] myChangedArray = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n))
.toArray();
This will log in the console, because the terminal operation
, in this case toArray()
, is called, and our stream is lazy and executes only when the terminal operation is called. Of course I can also do this:
IntStream myChangedInts = myIntStream
.peek(n -> System.out.println("About to square: " + n))
.map(n -> (int)Math.pow(n, 2))
.peek(n -> System.out.println("Done squaring, result: " + n));
And nothing will be printed, because the map isn't happening, because I don't need the data. Until I call this:
int[] myChangedArray = myChangedInts.toArray();
And voila, I get my mapped data, and my console logs. Except I see zero benefit to it whatsoever. I realize I can define the filter code long before I call to toArray()
, and I can pass around this "not-really-filtered stream around), but so what? Is this the only benefit?
The articles seem to imply there is a performance gain associated with laziness, for example:
In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimized to make it being capable of processing the large amount of data with high performance.
and
Java 8 Streams API optimizes stream processing with the help of short circuiting operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behavior.
It sounds literally like breaking out of a loop, and not associated with laziness at all.
Finally, there is this perplexing line in the second article:
Lazy operations achieve efficiency. It is a way not to work on stale data. Lazy operations might be useful in the situations where input data is consumed gradually rather than having whole complete set of elements beforehand. For example consider the situations where an infinite stream has been created using Stream#generate(Supplier<T>) and the provided Supplier function is gradually receiving data from a remote server. In those kind of the situations server call will only be made at a terminal operation when it's needed.
Not working on stale data? What? How does lazy loading keep someone from working on stale data?
TLDR: Is there any benefit to lazy loading besides being able to run the filter/map/reduce/whatever operation at a later time (which offers zero performance benefit)?
If so, what's a real-world use case?
Stream
can do you can also do in loops. But that's irrelevant. Nothing in the quoted documentation states thatStream
s are better than loops, only that the former is coded to be as efficient as possible. – Indochina