How to ensure order of processing in Java 8 streams?
Asked Answered
B

2

211

I want to process lists inside an XML Java object. I have to ensure processing all elements in the order I received them.

Should I therefore call sequential on each stream I use? list.stream().sequential().filter().forEach()

Or is it sufficient to just use the stream as long as I don't use parallelism? list.stream().filter().forEach()

Bey answered 23/3, 2015 at 17:29 Comment(1)
none of them. You need to use forEachOrdered: list.stream().filter().forEachOrdered()Brinkema
A
494

You are asking the wrong question. You are asking about sequential vs. parallel whereas you want to process items in order, so you have to ask about ordering. If you have an ordered stream and perform operations which guarantee to maintain the order, it doesn’t matter whether the stream is processed in parallel or sequential; the implementation will maintain the order.

The ordered property is distinct from parallel vs sequential. E.g. if you call stream() on a HashSet the stream will be unordered while calling stream() on a List returns an ordered stream. Note that you can call unordered() to release the ordering contract and potentially increase performance. Once the stream has no ordering there is no way to reestablish the ordering. (The only way to turn an unordered stream into an ordered is to call sorted, however, the resulting order is not necessarily the original order).

See also the “Ordering” section of the java.util.stream package documentation.

In order to ensure maintenance of ordering throughout an entire stream operation, you have to study the documentation of the stream’s source, all intermediate operations and the terminal operation for whether they maintain the order or not (or whether the source has an ordering in the first place).

This can be very subtle, e.g. Stream.iterate(T,UnaryOperator) creates an ordered stream while Stream.generate(Supplier) creates an unordered stream. Note that you also made a common mistake in your question as forEach does not maintain the ordering. You have to use forEachOrdered if you want to process the stream’s elements in a guaranteed order.

So if your list in your question is indeed a java.util.List, its stream() method will return an ordered stream and filter will not change the ordering. So if you call list.stream().filter() .forEachOrdered(), all elements will be processed sequentially in order, whereas for list.parallelStream().filter().forEachOrdered() the elements might be processed in parallel (e.g. by the filter) but the terminal action will still be called in order (which obviously will reduce the benefit of parallel execution).

If you, for example, use an operation like

List<…> result=inputList.parallelStream().map(…).filter(…).collect(Collectors.toList());

the entire operation might benefit from parallel execution but the resulting list will always be in the right order, regardless of whether you use a parallel or sequential stream.

Absorbance answered 23/3, 2015 at 18:49 Comment(9)
Yes, good answer. One thing that I've found is that the terminology we use, at least in English, such as "before," "after," and so forth, is quite ambiguous. There are two kinds of ordering here: 1) encounter order (also known as spatial order), and 2) processing order (also known as temporal order). With this distinction in mind it may be helpful to use words such as "left of" or "right of" when discussing encounter order and "earlier than" or "later than" when discussing processing order.Koralle
I understand List<> will preserve the order, but will Collection<>?Barreto
@JoshC. it depends on the actual collection type. Sets usually don’t, unless it’s a SortedSet or LinkedHashSet. The collection views of a Map (keySet(), entrySet(), and values()) inherit the Map’s policy, i.e. are ordered when the map is a SortedMap or LinkedHashMap. The behavior is determined by the characteristics reported by the collection’s spliterator. The default implementation of Collection does not report the ORDERED characteristic, so it’s unordered, unless overridden.Absorbance
@Absorbance I had a question that might be related somewhat to a small section of your answer.Vestpocket
Worth noting that forEachOrdered only differs to forEach when using parallel streams - but good practice to use it anyway when ordering matters in case the steaming method ever changes...Formidable
@Absorbance from the example provided in * even if it's list the order is not assured. Did i misunderstood something? *docs.oracle.com/javase/tutorial/collections/streams/…Waterish
@Waterish which example on that page do you mean?Absorbance
When they use listOfIntegers.parallelStream().forEach(e -> System.out.print(e + " ")). The result is "3 4 1 6 2 5 7 8" . Since the listOfIntegeres is a list, the result shouldn't be ordered? @AbsorbanceWaterish
@Waterish the output is preceded by “It prints output similar to the following:”, which means that “3 4 1 6 2 5 7 8” is exemplary for any possible output not matching the list’s order (which would be 8 7 6 5 4 3 2 1).Absorbance
B
11

In a nutshell:

Ordering depends on the source data structure and intermediate stream operations. Assuming you are using a List the processing should be ordered (since filter won't change the sequence here).

More details:

Sequential vs Parallel vs Unordered:

Javadocs

S sequential()

Returns an equivalent stream that is sequential. May return itself, either because the stream was already sequential, or because the underlying stream state was modified to be sequential. This is an intermediate operation.

S parallel()

Returns an equivalent stream that is parallel. May return itself, either because the stream was already parallel, or because the underlying stream state was modified to be parallel. This is an intermediate operation.

S unordered()

Returns an equivalent stream that is unordered. May return itself, either because the stream was already unordered, or because the underlying stream state was modified to be unordered. This is an intermediate operation.

Stream Ordering:

Javadocs

Streams may or may not have a defined encounter order. Whether or not a stream has an encounter order depends on the source and the intermediate operations. Certain stream sources (such as List or arrays) are intrinsically ordered, whereas others (such as HashSet) are not. Some intermediate operations, such as sorted(), may impose an encounter order on an otherwise unordered stream, and others may render an ordered stream unordered, such as BaseStream.unordered(). Further, some terminal operations may ignore encounter order, such as forEach().

If a stream is ordered, most operations are constrained to operate on the elements in their encounter order; if the source of a stream is a List containing [1, 2, 3], then the result of executing map(x -> x*2) must be [2, 4, 6]. However, if the source has no defined encounter order, then any permutation of the values [2, 4, 6] would be a valid result.

For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results.

For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Certain aggregate operations, such as filtering duplicates (distinct()) or grouped reductions (Collectors.groupingBy()) can be implemented more efficiently if ordering of elements is not relevant. Similarly, operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations. However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.

Boxboard answered 11/7, 2020 at 19:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.