Stream.peek() can be skipped for optimization
Asked Answered
C

3

31

I've come across a rule in Sonar which says:

A key difference with other intermediate Stream operations is that the Stream implementation is free to skip calls to peek() for optimization purpose. This can lead to peek() being unexpectedly called only for some or none of the elements in the Stream.

Also, it's mentioned in the Javadoc which says:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

In which case can java.util.Stream.peek() be skipped? Is it related to debugging?

Carabineer answered 24/8, 2022 at 8:50 Comment(4)
For example if you read a more up to date version of the documentation it says "In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements."Whew
I believe that the sonar description Stream.peek() A key difference with other intermediate Stream operations is that the Stream implementation is free to skip calls to peek() for optimization purpose. is wrong. .peek is not treated specially from .map operator where the following example showcases this: Stream.of("A", "B", "C", "D").map(a ->{System.out.println(a);return a; }).count(); Here the map operator is skipped as well. This is just an optimization of Streams not related with specific operator peek.Tardif
You can also scroll down and check my answer where I explain that the example that Sonar uses, is based on lazy computation which is just a part of Stream Api and not specific to peak. But this is important to understand to not fall in the trap that Sonar reports in this example.Tardif
See also In Java streams is peek really only for debugging?Seabury
G
28

Not only peek but also map can be skipped. It is for sake of optimization. For example, when the terminal operation count() is called, it makes no sense to peek or map the individual items as such operations do not change the number/count of the present items.

Here are two examples:


1. Map and peek are not skipped because the filter can change the number of items beforehand.

long count = Stream.of("a", "aa")
    .peek(s -> System.out.println("#1"))
    .filter(s -> s.length() < 2)
    .peek(s -> System.out.println("#2"))
    .map(s -> {
        System.out.println("#3");
        return s.length();
    })
    .count();
#1
#2
#3
#1
1

2. Map and peek are skipped because the number of items is unchanged.

long count = Stream.of("a", "aa")
    .peek(s -> System.out.println("#1"))
  //.filter(s -> s.length() < 2)
    .peek(s -> System.out.println("#2"))
    .map(s -> {
        System.out.println("#3");
        return s.length();
    })
    .count();
2

Important: The methods should have no side-effects (they do above, but only for the sake of example).

Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.

The following implementation is dangerous. Assuming callRestApi method performs a REST call, it won't be performed as the Stream violates the side-effect.

long count = Stream.of("url1", "url2")
    .map(string -> callRestApi(HttpMethod.POST, string))
    .count();
/**
 * Performs a REST call
 */
public String callRestApi(HttpMethod httpMethod, String url);
Geyserite answered 24/8, 2022 at 9:30 Comment(3)
Well, in theory, in example 1, the last map operation could be skipped, because after the last filter call, the element count cannot change.Ladyship
@MCEmperor there are a lot of theoretical optimization opportunities which are currently unused but may be used in a future version. That’s what makes relying on the absence of a legal optimization is so dangerous.Seabury
If you want to have a bit more fun, use IntStream.iterate(1, i -> i + 1) .flatMap(i -> IntStream.range(i, i + 10)) .peek(System.out::println) .filter(i -> i == 2) .findFirst() .ifPresent(System.out::println); and compare the Java 8 output and, e.g. Java 11 output. Then, you might insert a .parallel() somewhere and see what happens then…Seabury
B
13

peek() is an intermediate operation, and it expects a consumer which perform an action (side-effect) on elements of the stream.

In case when a stream pipe-line doesn't contain intermediate operations which can change the number of elements in the stream, like takeWhile, filter, limit, etc., and ends with terminal operation count() and when the stream-source allows evaluating the number of elements in it, then count() simply interrogates the source and returns the result. All intermediate operations get optimized away.

Note: this optimization of count() operation, which exists since Java 9 (see the API Note), is not directly related to peek(), it would affect every intermediate operation which doesn't change the number of elements in the stream (for now these are map(), sorted(), peek()).

There's More to it

peek() has a very special niche among other intermediate operations.

By its nature, peek() differs from other intermediate operations like map() as well as from the terminal operations that cause side-effects (like peek() does), performing a final action for each element that reaches them, which are forEach() and forEachOrdered().

The key point is that peek() doesn't contribute to the result of stream execution. It never affects the result produced by the terminal operation, whether it's a value or a final action.

In other words, if we throw away peek() from the pipeline, it would not affect the terminal operation.

Documentation of the method peek() as well the Stream API documentation warns its action could be elided, and you shouldn't rely on it.

A quote from the documentation of peek():

In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.

A quote from the API documentation, paragraph Side-effects:

The eliding of side-effects may also be surprising. With the exception of terminal operations forEach and forEachOrdered, side-effects of behavioral parameters may not always be executed when the stream implementation can optimize away the execution of behavioral parameters without affecting the result of the computation.

Here's an example of the stream (link to the source) where none of the intermediate operations gets elided apart from peek():

Stream.of(1, 2, 3)
    .parallel()
    .peek(System.out::println)
    .skip(1)
    .map(n -> n * 10)
    .forEach(System.out::println);

In this pipe-line peek() presides skip() therefor you might expect it to display every element from the source on the console. However, it doesn't happen (element 1 will not be printed). Due to the nature of peek() it might be optimized away without breaking the code, i.e. without affecting the terminal operation.

That's why documentation explicitly states that this operation is provided exclusively for debugging purposes, and it should not be assigned with an action which needs to be executed at any circumstances.

Barber answered 24/8, 2022 at 10:13 Comment(1)
What's the solution when you need a side-effect in a stream? Anything involving repeating or collecting and starting a new stream is a lot less efficient.Mcfarlin
T
4

The referenced optimization at this thread is the known architecture of java streams which is based on lazy computation.

Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed. (java doc)

Also

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed. (java doc)

This lazy computation affects several other operators not just .peek. In the same way that peek (which is an intermediate operation) is affected by this lazy computation are also all other intermediate operations affected (filter, map, mapToInt, mapToDouble, mapToLong, flatMap, flatMapToInt, flatMapToDouble, flatMapToLong). But probably someone not understanding the concept of lazy computation can be caught in the trap with .peek that sonar reports here.

So the example that the Sonar correctly reports

Stream.of("one", "two", "three", "four")
                .filter(e -> e.length() > 3)
                .peek(e -> System.out.println("Filtered value: " + e));

should not be used as is, because no terminal operation in the above example exists. So Streams will not invoke at all the intermidiate .peek operator, even though 2 elements ( "three", "four") are eligible to pass through the stream pipeline.

Example 1. Add a terminal operator like the following:

Stream.of("one", "two", "three", "four")
                .filter(e -> e.length() > 3)
                .peek(e -> System.out.println("Filtered value: " + e))
                .collect(Collectors.toList());  // <----

and the elements passed through would be also passed through .peek intermediate operator. Never an element would be skipped on this example.

Example 2. Now here is the interesting part, if you use some other terminal operator for example the .findFirst because the Stream Api is based on lazy computation

Stream.of("one", "two", "three", "four")
                .filter(e -> e.length() > 3)
                .peek(e -> System.out.println("Filtered value: " + e))
                .findFirst();  // <----

Only 1 element will pass through the operator .peek and not 2.

But as long as you know what you are doing (example 1) and you have understood lazy computation, you can expect that in certain cases .peek will be invoked for every element passing down the stream channel and no element would be skipped, and in other cases you would know which elements are to be skipped from .peek.

But extremely caution if you use .peek with parallel streams since there exists another set of traps which can arise. As the java API for .peek mentions:

For parallel stream pipelines, the action may be called at * whatever time and in whatever thread the element is made available by the * upstream operation. If the action modifies shared state, * it is responsible for providing the required synchronization.

Tardif answered 24/8, 2022 at 13:10 Comment(5)
See the other answers as when to peek will be skipped due to stream optimizations. Your examples only explain lazy evaluation. findFirst() will only evaluate the items until the first one is found, this is independent of peek. For instance Stream.of("A", "B", "C", "D").peek(System.out::println).count() will "consume" the full stream (terminal operation), but the items won't be printed (peek is optimized out); cf. Stream#countBronnie
@Bronnie No this is just the way .count is working and has nothing to do with .peek. What you mention in the above comment is wrong : will "consume" the full stream (terminal operation. As the documentation states: An implementation may choose to not execute the stream pipeline (either sequentially or in parallel) if it is capable of computing the count directly from the stream source. In such cases no source elements will be traversed and no intermediate operations will be evaluated. So in that case even if map operator was used no elements would have passed from that operator too.Tardif
@Bronnie the above example which you posted in your comment you could try it again with Stream.of("A", "B", "C", "D").map(a ->{System.out.println(a);return a; }).count();. You will see that also the map operator is not invoked. This is not an optimization related with peek, but some specific way the terminal operator count is able to work.Tardif
@Bronnie Appart from that, this question is rooted to the Sonar issue rules.sonarsource.com/java/RSPEC-3864. And the Sonar example is exactly related with lazy computationTardif
There is indeed no difference between these types of intermediate stages regarding whether they might get optimized away or not. It’s worth noting that these stages might also perform more work than naïvely expected, e.g. IntStream.iterate(1, i -> i + 1).parallel() .map(i -> { System.out.println("map " + i); return i; }) .peek(i -> System.out.println("peek " + i)) .anyMatch(i -> i == 2); But there is one difference, though; the other operations are not supposed to have side effects, so it shouldn’t matter which elements are processed. But peek can only operate through side effects.Seabury

© 2022 - 2025 — McMap. All rights reserved.