How does short-circuit work for Java 8 Streams?
Asked Answered
C

6

36

Reading up a bit on Java 8, I got to this blog post explaining a bit about streams and reduction of them, and when it would be possible to short-circuit the reduction. At the bottom it states:

Note in the case of findFirst or findAny we only need the first value which matches the predicate (although findAny is not guaranteed to return the first). However if the stream has no ordering then we’d expect findFirst to behave like findAny. The operations allMatch, noneMatch and anyMatch may not short-circuit the stream at all since it may take evaluating all the values to determine whether the operator is true or false. Thus an infinite stream using these may not terminate.

I get that findFirst or findAny may short-circuit the reduction, because as soon as you find an element, you do not need to process any further.

But why would this not be possible for allMatch, noneMatch and anyMatch? For allMatch, if you find one which doesn't match the predicate, you can stop processing. Same for none. And anyMatch especially doesn't make sense to me, as it it pretty much equal to findAny (except for what is returned)?

Saying that these three may not short-circuit, because it may take evaluating all the values, could also be said for findFirst/Any.

Is there some fundamental difference I'm missing? Am I not really understanding what is going on?

Curtice answered 24/8, 2015 at 22:37 Comment(2)
"may not" as in "might not", not as in "not allowed to".Gramme
@Gramme Yes, I'm just not getting where the difference comes from. All can be short-circuited the way I'm looking at it?Curtice
C
29

There's a subtle difference, because anyMatch family uses a predicate, while findAny family does not. Technically findAny() looks like anyMatch(x -> true) and anyMatch(pred) looks like filter(pred).findAny(). So here we have another issue. Consider we have a simple infinite stream:

Stream<Integer> s = Stream.generate(() -> 1);

So it's true that applying findAny() to such stream will always short-circuit and finish while applying anyMatch(pred) depends on the predicate. However let's filter our infinite stream:

Stream<Integer> s = Stream.generate(() -> 1).filter(x -> x < 0);

Is the resulting stream infinite as well? That's a tricky question. It actually contains no elements, but to determine this (for example, using .iterator().hasNext()) we have to check the infinite number of underlying stream elements, so this operation will never finish. I would call such stream an infinite as well. However using such stream both anyMatch and findAny will never finish:

Stream.generate(() -> 1).filter(x -> x < 0).anyMatch(x -> true);
Stream.generate(() -> 1).filter(x -> x < 0).findAny();

So findAny() is not guaranteed to finish either, it depends on the previous intermediate stream operations.

To conclude I would rate that blog-post as very misleading. In my opinion infinity stream behavior is better explained in official JavaDoc.

Checkerberry answered 25/8, 2015 at 1:46 Comment(2)
Does not answer to the question about allMatch.Jane
It does answer that. If prior stream operations are an infinite process then it doesn't matter whether the matching could short-circuit, regardless of whether it's anyMatch() or allMatch(). Finding a positive match and terminating is not conceptually different from finding a negative match and terminating.Christachristabel
G
15

Answer Updated

I'd say the blog post is wrong when it says "findFirst or findAny we only need the first value which matches the predicate".

In the javadoc for allMatch(Predicate), anyMatch(Predicate), noneMatch(Predicate), findAny(), and findFirst():

This is a short-circuiting terminal operation.

However, note that findFirst and findAny doesn't have a Predicate. So they can both return immediately upon seeing the first/any value. The other 3 are conditional and may loop forever if condition never fires.

Gramme answered 24/8, 2015 at 22:47 Comment(2)
From the blog: "The operations allMatch, noneMatch and anyMatch may not short-circuit the stream at all since it may take evaluating all the values to determine whether the operator is true or false." This is obviously incorrect since allMatch can be determined to be false as soon as first non-match is found.Furey
@MiserableVariable As I said, blog post is wrong. They can all short-circuit. --- The blog is right that 3 of them may potentially run forever, but it says it all wrong.Gramme
H
7

According to Oracle's Stream Documentation: https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#StreamOps

A terminal operation is short-circuiting if, when presented with infinite input, it may terminate in finite time. Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.

All five functions have the line:

This is a short-circuiting terminal operation.

In the description of the function.

Hornbeck answered 24/8, 2015 at 22:56 Comment(0)
V
0

When the javadoc says "may not short-circuit" it is merely pointing out that it is not a short circuit operation and depending on the values, the entire stream may be processed.

findFirst and findAny on the other hand, are guaranteed to short circuit since they never need to process the rest of the stream once they are satisfied.

Viv answered 24/8, 2015 at 22:43 Comment(1)
But it could also happen for the predicates, as soon as they aren('t) satisfied?Curtice
H
0
LongStream.range(0, Long.MAX_VALUE).allMatch(x -> x >= 0)

LongStream.range(0, Long.MAX_VALUE).allMatch(x -> x > 0)

The first one returns forever, the second one returns immediately

Haldas answered 18/2, 2022 at 19:14 Comment(0)
A
-1

anyMatch, noneMatch and allMatch return boolean values, so they may have to check all to prove the logic.

findFirst and findAny just care about finding the first they can and returning that.

Edit: For a given dataset the Match methods are guaranteed to always return the same value, however the Find methods are not because the order may vary and affect which value is returned.

The short circuiting described is talking about the Find methods lacking consistency for a given dataset.

Acquiesce answered 24/8, 2015 at 22:46 Comment(3)
But anyMatch does as well?Curtice
If the last value was the match, then yes. Point is the Match methods will always return the same value on the same data set, but the Find methods aren't guaranteed that consistency because the order may differ.Acquiesce
This lack of consistency is the short circuiting describedAcquiesce

© 2022 - 2024 — McMap. All rights reserved.