As per the documentation on Oracle's website [...]
That link is for Java 8. You may want to read the documentation for Java 9 (which came out in 2017) and later versions, as they are more explicit to this regard. Specifically:
A stream implementation is permitted significant latitude in optimizing the computation of the result. For example, a stream implementation is free to elide operations (or entire stages) from a stream pipeline -- and therefore elide invocation of behavioral parameters -- if it can prove that it would not affect the result of the computation. This means that side-effects of behavioral parameters may not always be executed and should not be relied upon, unless otherwise specified (such as by the terminal operations forEach
and forEachOrdered
). (For a specific example of such an optimization, see the API note documented on the count()
operation. For more detail, see the side-effects section of the stream package documentation.)
Source: Java 9's Javadoc for the Stream
interface.
And also the updated version of the doc you quoted:
Side-effects
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
If the behavioral parameters do have side-effects, unless explicitly stated, there are no guarantees as to:
- the visibility of those side-effects to other threads;
- that different operations on the "same" element within the same stream pipeline are executed in the same thread; and
- that behavioral parameters are always invoked, since a stream implementation is free to elide operations (or entire stages) from a stream pipeline if it can prove that it would not affect the result of the computation.
The ordering of side-effects may be surprising. Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray()
must produce [0, 2, 4, 6, 8]
), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.
The eliding of side-effects may also be surprising. With the exception of terminal operations forEach
and forEachOrdered
, side-effects of behavioral parameters may not always be executed when the stream implementation can optimize away the execution of behavioral parameters without affecting the result of the computation. (For a specific example see the API note documented on the count
operation.)
Source: Java 9's Javadoc for the java.util.stream
package.
All emphasis mine.
As you can see, the current official documentation goes into more detail on the issues that you may encounter if you decide to use side-effects in your stream operations. It is also very clear on forEach
and forEachOrdered
being the only terminal operations where execution of side-effects is guaranteed (mind you, thread-safety issues still apply, as the official examples show).
That being said, and regarding your specific code, and said code only:
public List<SavedCars> saveCars(List<Car> cars) {
return cars.stream()
.map(this::saveCar)
.collect(Collectors.toList());
}
I see no Streams-related issues with said code as-is.
- The
.map()
step will be executed because .collect()
(a mutable reduction operation, which is what the official doc recommends instead of things like .forEach(list::add)
) relies on .map()
's output and, since this (i.e. saveCar()
's) output is different than its input, the stream cannot "prove that [eliding] it would not affect the result of the computation".
- It is not a
parallelStream()
so it should not introduce any concurrency problems that didn't previously exist (of course, if someone added a .parallel()
later then problems may arise —much like if someone decided to parallelize a for
loop by firing up new threads for the inner computations).
That doesn't mean that the code in that example is Good Code™. The sequence .stream.map(::someSideEffect()).collect()
as a way of performing side-effects operations for every item in a collection may look like more simple / short / elegant? than its for
counterpart, and it sometimes may be. However, as Eugene, Holger and some others told you, there are better ways to approach this.
As a quick thought: the cost of firing up a Stream
vs iterating a simple for
is not negligible unless you have a lot of items, and if you have a lot of items then you: a) probably don't want to make a new DB access for each one, so a saveAll(List items)
API would be better; and b) probably don't want to take the performance hit of processing a lot of items sequentially, so you would end up using parallelization and then a whole new set of problems arise.
for
-loop? – SitnikparallelStream
you would certainly lose the transaction context. – Bautistaid
generation is handled by my JPA implementation, I want my returnedSavedCars
to include their newid
. If I had to do this in 2 steps, it would require both a save and a fetch, instead of only a save in this case. – Sitnikmap
that I showed is not executed is not enough? ( Amap
that you rely on?). Things are even worse if you involveparallel
into the picture with other examples. – Kmesonmap
will not be executed does not make it so. Not that I don't believe you, but please support your statements with the sources that prove it. Also,parallel
is not part of this question. – Sitnikparallel
that I brought up; it's just easier to prove a point using that. As to why the documentation tells not to do it, is exactly because they can bring in some future improvements ( like in the example I have, that happened from java-8 to 9) that would be transparent to you IFF you would obey to the documentation. – Kmesoncount
, without any intermediate operations. This one is better. – Sitnikpeek
vsmap
, which you fail to recognize. – Kmesonpeek
andmap
are indeed intermediate stream operations that should not invoke side-effects (which I what my question is about), but the link you posted does not talk about intermediate side-effects, only thecount
terminal operation. No where in the ticket that was posted on your answer do they talk aboutmap
. – Sitniksave(car)
fails with an exception, some operation may have succeeded already, not visible in the result of this operation. b) performing a database operation for each object is a performance disaster. Every reasonable API has methods for storing all objects in one operation. – Ragout