java 8 parallelStream() with sorted()
Asked Answered
C

3

39

JDK 8 EA is out now, and I am just trying to get used to the lambda and the new Stream API. I've tried to sort a list with parallel stream, but the result is always wrong:

import java.util.ArrayList;
import java.util.List;

public class Test
{
    public static void main(String[] args)
    {
        List<String> list = new ArrayList<>();
        list.add("C");
        list.add("H");
        list.add("A");
        list.add("A");
        list.add("B");
        list.add("F");
        list.add("");

        list.parallelStream() // in parallel, not just concurrently!
            .filter(s -> !s.isEmpty()) // remove empty strings
            .distinct() // remove duplicates
            .sorted() // sort them
            .forEach(s -> System.out.println(s)); // print each item
    }
}

OUTPUT:

C
F
B
H
A

Note that each time the output is different. My questions is, is it a bug? or is it not possible to sort a list in parallel? if so, then why the JavaDoc doesn't state that? Last question, is there another operation whose output would differ depending on the stream type?

Circumference answered 22/10, 2013 at 23:14 Comment(1)
It would probably be better to remove duplicates after sorting.Wandering
T
62

You need to use forEachOrdered, not forEach.

As per the forEach doc:

For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism. For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization.

Tuppence answered 22/10, 2013 at 23:25 Comment(1)
My guess is that internally it's creating a "sorted" list, each thread adds to that list, then continues to the next step in the flow (the forEach) so it executes out of order, FWIW.Sound
G
8

In addition, you can read more about parallelism and forEachOrdered with a very nice example from here. In summary, using forEachOrdered in a parallel stream might result to lose the benefits of parallelism.

Here the example from the same resource:

Integer[] intArray = {1, 2, 3, 4, 5, 6, 7, 8 };
List<Integer> listOfIntegers =
    new ArrayList<>(Arrays.asList(intArray));

System.out.println("listOfIntegers:");
listOfIntegers
    .stream()
    .forEach(e -> System.out.print(e + " "));
System.out.println("");

System.out.println("listOfIntegers sorted in reverse order:");
Comparator<Integer> normal = Integer::compare;
Comparator<Integer> reversed = normal.reversed(); 
Collections.sort(listOfIntegers, reversed);  
listOfIntegers
    .stream()
    .forEach(e -> System.out.print(e + " "));
System.out.println("");

System.out.println("Parallel stream");
listOfIntegers
    .parallelStream()
    .forEach(e -> System.out.print(e + " "));
System.out.println("");

System.out.println("Another parallel stream:");
listOfIntegers
    .parallelStream()
    .forEach(e -> System.out.print(e + " "));
System.out.println("");

System.out.println("With forEachOrdered:");
listOfIntegers
    .parallelStream()
    .forEachOrdered(e -> System.out.print(e + " "));
System.out.println("");

And the output is

listOfIntegers:
1 2 3 4 5 6 7 8
listOfIntegers sorted in reverse order:
8 7 6 5 4 3 2 1
Parallel stream:
3 4 1 6 2 5 7 8
Another parallel stream:
6 3 1 5 7 8 4 2
With forEachOrdered:
8 7 6 5 4 3 2 1

The fifth pipeline uses the method forEachOrdered, which processes the elements of the stream in the order specified by its source, regardless of whether you executed the stream in serial or parallel. Note that you may lose the benefits of parallelism if you use operations like forEachOrdered with parallel streams

.

Glossator answered 25/2, 2015 at 14:5 Comment(1)
That's a little thin. Please expand on your answer by editing it.Imitative
S
1

Just in case you are curious, when you collect the results using .toList() instead of streaming them through .forEach(), it will sort the items.

public static void main(String[] args) {
    var list1 = List.of("C", "H", "A", "A", "B", "F", "").parallelStream()
        .filter(s -&gt; !s.isEmpty())
        .distinct()
        .sorted()
        .toList();

    System.out.println(list1);
}

It will print: [A, B, C, F, H]

Sibship answered 17/5 at 16:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.