Java Streams - Get a "symmetric difference list" from two other lists
Asked Answered
O

7

9

Im trying to use Java 8 streams to combine lists. How can I get a "symmetric difference list" (all object that only exist in one list) from two existing lists. I know how to get an intersect list and also how to get a union list.

In the code below I want the disjoint Cars from the two lists of cars (bigCarList,smallCarList). I expect the result to be a list with the 2 cars ("Toyota Corolla" and "Ford Focus")

Example code:

public void testDisjointLists() {
    List<Car> bigCarList = get5DefaultCars();
    List<Car> smallCarList = get3DefaultCars();

    //Get cars that exists in both lists
    List<Car> intersect = bigCarList.stream().filter(smallCarList::contains).collect(Collectors.toList());

    //Get all cars in both list as one list
    List<Car> union = Stream.concat(bigCarList.stream(), smallCarList.stream()).distinct().collect(Collectors.toList());

    //Get all cars that only exist in one list
    //List<Car> disjoint = ???

}

public List<Car> get5DefaultCars() {
    List<Car> cars = get3DefaultCars();
    cars.add(new Car("Toyota Corolla", 2008));
    cars.add(new Car("Ford Focus", 2010));
    return cars;
}

public List<Car> get3DefaultCars() {
    List<Car> cars = new ArrayList<>();
    cars.add(new Car("Volvo V70", 1990));
    cars.add(new Car("BMW I3", 1999));
    cars.add(new Car("Audi A3", 2005));
    return cars;
}

class Car {
    private int releaseYear;
    private String name;
    public Car(String name) {
        this.name = name;
    }
    public Car(String name, int releaseYear) {
        this.name = name;
        this.releaseYear = releaseYear;
    }

    //Overridden equals() and hashCode()
}
Outfoot answered 26/6, 2015 at 13:30 Comment(3)
List<Car> disjoint = bigCarList; ? I don't really understand the question. You have two lists A and B, and you want all the elements of the list A. So, just use A. Please clarify.Otha
I think the OP is looking for the symmetric difference of the lists.Kiehl
Yes i looking for the Symmetric difference. Sorry to fool you all by my bad english.Outfoot
K
12

Based on your own code, there is a straight-forward solution:

List<Car> disjoint = Stream.concat(
    bigCarList.stream().filter(c->!smallCarList.contains(c)),
    smallCarList.stream().filter(c->!bigCarList.contains(c))
).collect(Collectors.toList());

Just filter one list for all items not contained in the other and vice versa and concatenate both results. That works fairly well for small lists and before consider optimized solutions like hashing or making the result distinct() you should ask yourself why you are using lists if you don’t want neither, duplicates nor a specific order.

It seems like you actually want Sets, not Lists. If you use Sets, Tagir Valeev’s solution is appropriate. But it is not working with the actual semantics of Lists, i.e. doesn’t work if the source lists contain duplicates.


But if you are using Sets, the code can be even simpler:

Set<Car> disjoint = Stream.concat(bigCarSet.stream(), smallCarSet.stream())
  .collect(Collectors.toMap(Function.identity(), t->true, (a,b)->null))
  .keySet();

This uses the toMap collector which creates a Map (the value is irrelevant, we simply map to true here) and uses a merge function to handle duplicates. Since for two sets, duplicates can only occur when an item is contained in both sets, these are the items we want remove.

The documentation of Collectors.toMap says that the merge function is treated “as supplied to Map.merge(Object, Object, BiFunction)” and we can learn from there, that simply mapping the duplicate pair to null will remove the entry.

So afterwards, the keySet() of the map contains the disjoint set.

Kith answered 26/6, 2015 at 15:39 Comment(1)
@Kith NICE!!! I did not notice that part in the documentation: .. or removes it if the result is null. awesomeTuxedo
E
6

Something like this may work:

Stream.concat(bigCarList.stream(), smallCarList.stream())
      .collect(groupingBy(Function.identity(), counting()))
      .entrySet().stream()
      .filter(e -> e.getValue().equals(1L))
      .map(Map.Entry::getKey)
      .collect(toList());

Here we first collect all the cars to the Map<Car, Long> where value is the number of such cars encountered. After that, we filter this Map leaving only cars that are encountered exactly once, drop the counts and collect to the final List.

Erle answered 26/6, 2015 at 13:48 Comment(3)
This doesn’t work if the source lists contain duplicates. But I think that is rather an issue of the question’s precondition as the OP seem to actually want Sets. But when using Sets, the solution can be even simplerKith
@Kith I reached it with a similar question for Streams rather than the Lists. I think adding up .distinct() to each substream could help, but thought over the performance strikes me for the stateful nature (Just thinking loud, haven't really thought through yet.)Orndorff
The question I was referring to and I made an answer to is this, in case you find additional information elaborate and coupled with the stream approach.Orndorff
G
0

A little bit math

disjoint = A and B are disjoint if their intersect is empty.

A disjoint is not a set, it is an indicator showing if two sets are disjoint or not. From your description I think you where searching the symmetric difference.

Symmetric Difference

But anyhow, if you only want to collect to new Lists then all you need is a collector.

I made a method that creates an Collector. This Collector only "collects" values, where the predicate is evaluated to true. So if you are searching for the symmetric difference, than you only need a predicate.

  public void testDisjointLists() {
    List<Car> bigCarList = get5DefaultCars();
    List<Car> smallCarList = get3DefaultCars();

    Collector<Car, ArrayList<Car>, ArrayList<Car>> inter
        = produceCollector(car -> {
          return bigCarList.contains(car) && smallCarList.contains(car);
        });

    Collector<Car, ArrayList<Car>, ArrayList<Car>> symDiff
        = produceCollector(car -> {
          return bigCarList.contains(car) ^ smallCarList.contains(car);
        });

    //Get all cars in both list as one list
    List<Car> union
        = Stream.concat(bigCarList.stream(), smallCarList.stream()).distinct().collect(Collectors.toList());

    List<Car> intersect = union.stream().collect(inter);

    //Get all cars that only exist not exists in both Lists
    List<Car> symmetricDifference = union.stream().collect(symDiff);

    System.out.println("Union Cars:");
    union.stream().forEach(car -> System.out.println("Car: " + car));
    System.out.println("");

    System.out.println("Intersect Cars: ");
    intersect.stream().forEach(car -> System.out.println("Car: " + car));
    System.out.println("");

    System.out.println("Symmetric Difference: ");
    symmetricDifference.stream().forEach(car -> System.out.println("Car: " + car));
    System.out.println("");
  }

  public Collector<Car, ArrayList<Car>, ArrayList<Car>> produceCollector(Predicate<Car> predicate) {
    Collector<Car, ArrayList<Car>, ArrayList<Car>> collector = Collector.of(
        ArrayList::new,
        (al, car) -> {
          if (predicate.test(car)) {
            al.add(car);
          }
        },
        (al1, al2) -> {
          al1.addAll(al2);
          return al1;
        }
    );
    return collector;
  }

For performance freaks

After doing some research, it seems that the collector is about 14 times faster than a first filter solution.

long before2 = System.nanoTime();
List<Car> intersect2 = union.stream().filter(car -> {
  return bigCarList.contains(car) && smallCarList.contains(car);
}).collect(Collectors.toList());
long after2 = System.nanoTime();
System.out.println("Time for first filter solution: " + (after2 - before2));


long before = System.nanoTime();
List<Car> intersect = union.stream().collect(inter);
long after = System.nanoTime();
System.out.println("Time for collector solution: " + (after - before));

Time for first filter solution: 540906

Time for collector solution: 37543

Grassland answered 26/6, 2015 at 16:1 Comment(5)
Your symmetricDifference isn’t symmetric; it doesn’t accept cars of the smallCarList not contained in the bigCarList. You can use bigCarList.contains(car) ^ smallCarList.contains(car) to solve this. By the way, there is no need for the intersect operation to test whether car is contained in bigCarList when you stream over bigCarList. That redundant test can be omitted.Kith
There is only one remark left. While it might be a good exercise to write a custom Collector (but you shouldn’t ignore type safety, ArrayList without type arguments is a raw type), there is no benefit here. A .filter(predicate).collect(Collectors.toList()) on a stream would do as well. It’s not the intended use of the stream API to move all intermediate steps into the collector…Kith
Alternatively you could turn produceCollector into a generic method as the collector can collect arbitrary items.Kith
I was thinking about a generic solution, but for this example I thought it is much easier.Grassland
It’s actually very easy. Just replace all occurrences of Car at the declaration and inside of produceCollector with T and place a single <T> right before the return type. That’s all. If you are using an IDE, it’s even easier. Just place a <Car> right before the return type, then tell the IDE to rename that type parameter to T (technically, that renaming is not necessary but having a type parameter with the same name as a concrete class is not recommended).Kith
F
0

An alternative approach, albeit not as elegant as one line streams:

    HashMap<Integer, Boolean> y = new HashMap<>();
    bigCarSet ().forEach(i -> y.put(i, !y.containsKey(i)));
    bigCarList().forEach(i -> y.put(i, !y.containsKey(i)));
    y.entrySet().stream().filter(Map.Entry::getValue).map(Map.Entry::getKey)
     .collect(Collectors.toList());

which can be simplified to at least:

    HashMap<Integer, Boolean> y = new HashMap<>();
    Stream.concat(list1.stream(), list2.stream()).forEach(i -> y.put(i, !y.containsKey(i)));
    y.entrySet().stream().filter(Map.Entry::getValue)
                 .map(Map.Entry::getKey).collect(Collectors.toList());
Finical answered 23/11, 2020 at 17:17 Comment(0)
S
0

OP is asking for the symmetric difference. And the symmetric difference can be expressed as:

  1. Either the difference between the union and the intersection:

    A △ B = (A ∪ B) - (B ∩ A)

  2. Or the union of the differences:

    A △ B = (A – B) ∪ (B – A)

The first part of this answer achieves it by #2, while the second part achieves it by #1. Here I'll show a variation of approach #1:

List<Car> result = new ArrayList<>(bigCarList);
result.addAll(smallCarList); // (A ∪ B)

result.removeIf(c -> bigCarList.contains(c) && smallCarList.contains(c)); // (B ∩ A)

This can be optimized if lists are converted to sets, so that using contains is O(1):

List<Car> bigCarList = get5DefaultCars();
List<Car> smallCarList = get3DefaultCars();

Set<Car> bigCarSet = new HashSet<>(bigCarList);
Set<Car> smallCarSet = new HashSet<>(smallCarList);

Set<Car> result = new LinkedHashSet<>(bigCarList);
result.addAll(smallCarList); // (A ∪ B)

result.removeIf(c -> bigCarSet.contains(c) && smallCarSet.contains(c)); // (B ∩ A)
Scrupulous answered 23/11, 2020 at 18:20 Comment(0)
M
0

the lambda solution with groupingBy:
the map values with the true-key are in both lists
the map values with the false-key are disjoint

Map<Boolean,List<Car>> map = Stream.concat(bigCarList.stream(),
    smallCarList.stream()).collect(
        groupingBy( b -> bigCarList.stream().anyMatch( s -> b.equals( s ) )
            && smallCarList.stream().anyMatch( s -> b.equals( s ) ) ) );
List<Car> disjoint = map.get( false );  // [Toyota Corolla, Ford Focus]


same principle but shorter w/o inline streams:

Map<Boolean,List<Car>> map = Stream.concat(bigCarList.stream(),
    smallCarList.stream()).collect(
        groupingBy( b -> bigCarList.contains( b )
            && smallCarList.contains( b ) ) );
List<Car> disjoint = map.get( false );  // [Toyota Corolla, Ford Focus]

both are working with duplicates as well
means: duplicates in one list that are not contained in the other list
If the amount of data is not so huge that you are running into disk space issues, a simple groupingBy ‑ without filtering or additional queries to reduce the result set ‑ should be the clearest and fastest solution.

Murex answered 24/11, 2020 at 15:46 Comment(0)
I
0
    public static void main(String[] args) {
    List<Integer> list1 = Arrays.asList(1, 2, 3, 4);
    List<Integer> list2 = Arrays.asList(2, 3, 4, 5);

    List<Integer> diff = Stream.concat(list1.stream().filter(l1 -> !list2.contains(l1)),
            list2.stream().filter(l2 -> !list1.contains(l2))).collect(Collectors.toList());

    System.out.println(diff);
}
Isom answered 20/6 at 12:47 Comment(1)
above example with integer values, replace the integer values with String or any Object to get list differencesIsom

© 2022 - 2024 — McMap. All rights reserved.