Java 8 streams to find the duplicate elements

Asked 28/12, 2014 at 14:19 Answered 4/4, 2024 at 14:32

119

I am trying to list out duplicate elements in an integer list using Streams of JDK 8. For example:

// Duplicates are {1, 4}
List<Integer> numbers = Arrays.asList(new Integer[]{1,2,1,3,4,4});

To remove duplicates we can use the distinct() method. But what about finding the duplicated elements?

Onofredo answered 28/12, 2014 at 14:19 Comment(3)

possible duplicate of Collect stream with grouping, counting and filtering operations – Stets 13/8, 2015 at 2:49

If you don't want to collect the stream, this essentially boils down to "how can I look at more than one item at once in a stream"? – Abhorrence 3/5, 2018 at 9:2

Set<Integer> items = new HashSet(); numbers.stream().filter(n -> i!tems.add(n)).collect(Collectors.toSet()); – Archimandrite 5/9, 2020 at 5:46

150

You can use Collections.frequency:

numbers.stream().filter(i -> Collections.frequency(numbers, i) >1)
                .collect(Collectors.toSet()).forEach(System.out::println);

Mum answered 11/9, 2015 at 6:21 Comment(5)

The same O(n^2) performance as in @OussamaZoghlami answer, though probably simpler. Nevertheless here's an upvote. Welcome to StackOverflow! – Stets 11/9, 2015 at 6:39

As mentioned, this is a n^2 solution where a trivial linear solution exists. I wouldn't accept this in CR. – Charlean 3/4, 2018 at 11:8

It may be slower than @Dave option, but it's prettier so I'll take the performance hit. – Endbrain 18/1, 2019 at 21:47

@Charlean is your point regarding n^2 solution referring to the use of Collections.frequency in a filter? – Longhand 21/7, 2019 at 4:49

@Longhand yes, it's quadratic because the frequency call has to visit every element in numbers, and it's being called on every element. Thus, for each element, we visit every element -- n^2 and needlessly inefficient. – Charlean 21/7, 2019 at 14:7

106

Basic example. First-half builds the frequency-map, second-half reduces it to a filtered list. Probably not as efficient as Dave's answer, but more versatile (like if you want to detect exactly two etc.)

List<Integer> duplicates = IntStream.of( 1, 2, 3, 2, 1, 2, 3, 4, 2, 2, 2 )
   .boxed()
   .collect( Collectors.groupingBy( Function.identity(), Collectors.counting() ) )
   .entrySet()
   .stream()
   .filter( p -> p.getValue() > 1 )
   .map( Map.Entry::getKey )
   .collect( Collectors.toList() );

Camper answered 10/7, 2015 at 13:25 Comment(3)

This answer is the correct one imo because it is linear and doesn't violate the "stateless predicate" rule. – Charlean 21/7, 2019 at 14:24

@Charlean Its not really, Collectors.counting() is the same as the above answer. and IMHO, in a small set the above one is much simpler and cleaner – Beneficiary 29/4, 2021 at 4:53

@Beneficiary it is not the same. In above answer, each item is filtered against it's frequency, for each item again. That is really something else than making a map. – Camper 30/4, 2021 at 15:1

You need a set (allItems below) to hold the entire array contents, but this is O(n):

Integer[] numbers = new Integer[] { 1, 2, 1, 3, 4, 4 };
Set<Integer> allItems = new HashSet<>();
Set<Integer> duplicates = Arrays.stream(numbers)
        .filter(n -> !allItems.add(n)) //Set.add() returns false if the item was already in the set.
        .collect(Collectors.toSet());
System.out.println(duplicates); // [1, 4]

Servais answered 9/6, 2015 at 20:11 Comment(7)

I think this is the most efficient solution. – Camper 10/7, 2015 at 13:40

filter() requires a stateless predicate. Your "solution" is strikingly similar to the example of a stateful predicate given in the javadoc: docs.oracle.com/javase/8/docs/api/java/util/stream/… – Goodwill 23/9, 2015 at 15:43

@MattMcHenry: does that mean this solution has the potential to produce unexpected behavior, or is it just bad practice? – Carter 19/8, 2016 at 14:38

@Carter In a localized case like there where you know for sure that the stream is sequential(), it's probably safe. In the more general case where the stream may be parallel(), it's pretty much guaranteed to break in weird ways. – Goodwill 21/8, 2016 at 19:1

In addition to producing unexpected behavior in some situations, this mixes paradigms as Bloch argues you shouldn't in the third edition of Effective Java. If you find yourself writing this, just use a for loop. – Charlean 3/4, 2018 at 11:12

Although the approach is up to discussion for a production-level code, this is a pretty good coding exercise. Parallel execution can be fixed with synchronization, it won't break anything. Further more, this can be done in a one-liner - without declaring the Set<Integer> allItems = new HashSet<>(); – Countermarch 19/4, 2018 at 19:40

Found this in the wild being used by Hibernate Validator UniqueElements constraint. – Servais 12/7, 2018 at 20:23

An O(n) way would be as below:

List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 4, 4);
Set<Integer> duplicatedNumbersRemovedSet = new HashSet<>();
Set<Integer> duplicatedNumbersSet = numbers.stream().filter(n -> !duplicatedNumbersRemovedSet.add(n)).collect(Collectors.toSet());

The space complexity would go double in this approach, but that space is not a waste; in-fact, we now have the duplicated alone only as a Set as well as another Set with all the duplicates removed too.

Area answered 10/8, 2015 at 20:2 Comment(1)

Looks like the same solution as Dave's above - https://mcmap.net/q/181776/-java-8-streams-to-find-the-duplicate-elements – Aalst 17/8, 2023 at 6:28

My StreamEx library which enhances the Java 8 streams provides a special operation distinct(atLeast) which can retain only elements appearing at least the specified number of times. So your problem can be solved like this:

List<Integer> repeatingNumbers = StreamEx.of(numbers).distinct(2).toList();

Internally it's similar to @Dave solution, it counts objects, to support other wanted quantities and it's parallel-friendly (it uses ConcurrentHashMap for parallelized stream, but HashMap for sequential). For big amounts of data you can get a speed-up using .parallel().distinct(2).

Stets answered 13/8, 2015 at 2:53 Comment(1)

The question is about Java Streams, not third-party libraries. – Supernumerary 11/5, 2017 at 6:25

You can get the duplicated like this :

List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 4, 4);
Set<Integer> duplicated = numbers
  .stream()
  .filter(n -> numbers
        .stream()
        .filter(x -> x == n)
        .count() > 1)
   .collect(Collectors.toSet());

Camelot answered 29/12, 2014 at 14:55 Comment(4)

Isn't that an O(n^2) operation? – Tetracaine 7/5, 2015 at 5:29

Try to use numbers = Arrays.asList(400, 400, 500, 500); – Stets 11/9, 2015 at 6:42

Is this similar to creating a 2 depth loop? for(..) { for(..) } Just curios how internally it works – Palmar 6/6, 2019 at 10:42

Though it is a nice approach, yet having stream inside stream is costly. – Ocd 25/8, 2020 at 8:54

I think basic solutions to the question should be as below:

Supplier supplier=HashSet::new; 
HashSet has=ls.stream().collect(Collectors.toCollection(supplier));

List lst = (List) ls.stream().filter(e->Collections.frequency(ls,e)>1).distinct().collect(Collectors.toList());

well, it is not recommended to perform a filter operation, but for better understanding, i have used it, moreover, there should be some custom filtration in future versions.

Globigerina answered 9/7, 2018 at 13:45 Comment(0)

A multiset is a structure maintaining the number of occurrences for each element. Using Guava implementation:

Set<Integer> duplicated =
        ImmutableMultiset.copyOf(numbers).entrySet().stream()
                .filter(entry -> entry.getCount() > 1)
                .map(Multiset.Entry::getElement)
                .collect(Collectors.toSet());

Fritzsche answered 10/7, 2018 at 12:59 Comment(0)

If you only need to detect the presence of duplicates (instead of listing them, which is what the OP wanted), just convert them into both a List and Set, then compare the sizes:

    List<Integer> list = ...;
    Set<Integer> set = new HashSet<>(list);
    if (list.size() != set.size()) {
      // duplicates detected
    }

I like this approach because it has less places for mistakes.

Lurlenelurline answered 24/6, 2019 at 17:21 Comment(0)

the creating of an additional map or stream is time- and space-consuming…

Set<Integer> duplicates = numbers.stream().collect( Collectors.collectingAndThen(
  Collectors.groupingBy( Function.identity(), Collectors.counting() ),
  map -> {
    map.values().removeIf( cnt -> cnt < 2 );
    return( map.keySet() );
  } ) );  // [1, 4]

…and for the question of which is claimed to be a [duplicate]

public static int[] getDuplicatesStreamsToArray( int[] input ) {
  return( IntStream.of( input ).boxed().collect( Collectors.collectingAndThen(
      Collectors.groupingBy( Function.identity(), Collectors.counting() ),
      map -> {
        map.values().removeIf( cnt -> cnt < 2 );
        return( map.keySet() );
      } ) ).stream().mapToInt( i -> i ).toArray() );
}

Crippling answered 19/7, 2019 at 6:15 Comment(0)

What about checking of indexes?

        numbers.stream()
            .filter(integer -> numbers.indexOf(integer) != numbers.lastIndexOf(integer))
            .collect(Collectors.toSet())
            .forEach(System.out::println);

Margret answered 2/10, 2018 at 13:38 Comment(1)

Should work fine, but also O(n^2) performance as some other solutions here. – Baumbaugh 2/10, 2018 at 13:56

Set.add() is faster if you're looking for performance.

public class FindDuplicatedBySet {

public static void main(String[] args) {
    List<Integer> list = Arrays.asList(5, 3, 4, 1, 3, 7, 2,3,1, 9, 9, 4,1);
    Set<Integer> result = findDuplicatedBySetAdd(list);
    result.forEach(System.out::println);
  }

public static <T> Set<T> findDuplicatedBySetAdd(List<T> list) {
    Set<T> items = new HashSet<>();
    return list.stream()
            .filter(n -> !items.add(n))
            .collect(Collectors.toSet());
  }
}

Krystakrystal answered 18/5, 2021 at 6:16 Comment(1)

Why is this not the best and easiest answer of them all ? Some benchmark test shows its fastest too. mkyong.com/java8/java-8-find-duplicate-elements-in-a-stream – Goldfish 15/9, 2022 at 18:23

Using stream

Set<Integer> set = new HashSet<>();
list.stream()
     .filter(data -> !set.add(data))
     .forEach(data -> System.out.println("duplicates "+data));

Offertory answered 24/9, 2022 at 8:18 Comment(0)

I think I have good solution how to fix problem like this - List => List with grouping by Something.a & Something.b. There is extended definition:

public class Test {

    public static void test() {

        class A {
            private int a;
            private int b;
            private float c;
            private float d;

            public A(int a, int b, float c, float d) {
                this.a = a;
                this.b = b;
                this.c = c;
                this.d = d;
            }
        }


        List<A> list1 = new ArrayList<A>();

        list1.addAll(Arrays.asList(new A(1, 2, 3, 4),
                new A(2, 3, 4, 5),
                new A(1, 2, 3, 4),
                new A(2, 3, 4, 5),
                new A(1, 2, 3, 4)));

        Map<Integer, A> map = list1.stream()
                .collect(HashMap::new, (m, v) -> m.put(
                        Objects.hash(v.a, v.b, v.c, v.d), v),
                        HashMap::putAll);

        list1.clear();
        list1.addAll(map.values());

        System.out.println(list1);
    }

}

class A, list1 it's just incoming data - magic is in the Objects.hash(...) :)

Donielle answered 7/3, 2017 at 13:5 Comment(1)

Warning: If Objects.hash produces the same value for (v.a_1, v.b_1, v.c_1, v.d_1) and (v.a_2, v.b_2, v.c_2, v.d_2), then they are going to be considered equal and be removed as duplicates, without actually checking that the a's, b's, c's, and d's are the same. This may be an acceptable risk, or you may want to use a function other than Objects.hash which is guaranteed to produce a unique result across your domain. – Sacramentalist 7/6, 2018 at 19:48

Do you have to use the java 8 idioms (steams)? Perphaps a simple solution would be to move the complexity to a map alike data structure that holds numbers as key (without repeating) and the times it ocurrs as a value. You could them iterate that map an only do something with those numbers that are ocurrs > 1.

import java.lang.Math;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
import java.util.Iterator;

public class RemoveDuplicates
{
  public static void main(String[] args)
  {
   List<Integer> numbers = Arrays.asList(new Integer[]{1,2,1,3,4,4});
   Map<Integer,Integer> countByNumber = new HashMap<Integer,Integer>();
   for(Integer n:numbers)
   {
     Integer count = countByNumber.get(n);
     if (count != null) {
       countByNumber.put(n,count + 1);
     } else {
       countByNumber.put(n,1);
     }
   }
   System.out.println(countByNumber);
   Iterator it = countByNumber.entrySet().iterator();
    while (it.hasNext()) {
        Map.Entry pair = (Map.Entry)it.next();
        System.out.println(pair.getKey() + " = " + pair.getValue());
    }
  }
}

Pus answered 19/4, 2018 at 18:17 Comment(0)

Try this solution:

public class Anagramm {

public static boolean isAnagramLetters(String word, String anagramm) {
    if (anagramm.isEmpty()) {
        return false;
    }

    Map<Character, Integer> mapExistString = CharCountMap(word);
    Map<Character, Integer> mapCheckString = CharCountMap(anagramm);
    return enoughLetters(mapExistString, mapCheckString);
}

private static Map<Character, Integer> CharCountMap(String chars) {
    HashMap<Character, Integer> charCountMap = new HashMap<Character, Integer>();
    for (char c : chars.toCharArray()) {
        if (charCountMap.containsKey(c)) {
            charCountMap.put(c, charCountMap.get(c) + 1);
        } else {
            charCountMap.put(c, 1);
        }
    }
    return charCountMap;
}

static boolean enoughLetters(Map<Character, Integer> mapExistString, Map<Character,Integer> mapCheckString) {
    for( Entry<Character, Integer> e : mapCheckString.entrySet() ) {
        Character letter = e.getKey();
        Integer available = mapExistString.get(letter);
        if (available == null || e.getValue() > available) return false;
    }
    return true;
}

}

Concealment answered 15/5, 2018 at 6:2 Comment(0)

**How to find Non-Repeated Numbers from the array using java8**

Integer[] intArr = {1,1,3,2,2,5,4,4,7,6,6,9,8,8,10,13};

Set<Integer> result = Arrays.asList(intArr).stream().
filter(x -> Collections.frequency(Arrays.asList(intArr), x) == 1).
        collect(Collectors.toSet());

 System.out.println(result); //output : [3, 5, 7, 9, 10, 13] **Non-duplicate** values


**How to find repeated Numbers from array using java8**

Set<Integer> set = new HashSet();

Set<Integer> result = Arrays.asList(intArr).stream().filter(x -> !set.add(x)).collect(Collectors.toSet());
    
System.out.println(result); // output : [1, 2, 4, 6, 8]  it returns **Duplicates values.**

Incurious answered 10/4, 2023 at 13:39 Comment(1)

part 1 is not an answer to the question, part 2 is (basically) a copy of an earlier answer - what news am I missing? btw: please format your answer properly (in code, indentation matters and not-in-code should be .. not formatted as such ;) – Demitasse 10/4, 2023 at 14:18

The JEP 461: Stream Gatherers Java 22 preview language feature adds support for gatherer operations, which could be used to filter the stream for duplicates:

void main() {
    List<Integer> numbers = List.of(1, 2, 1, 3, 4, 4, 1);
    List<Integer> duplicates = numbers.stream().gather(duplicates()).toList();
    System.out.println(duplicates); // [1, 4]
}

static <T> Gatherer<T, ?, T> duplicates() {
    Gatherer.Integrator.Greedy<Map<T, Integer>, T, T> integrator =
            (state, element, downstream) -> {
                Integer occurrences =
                        state.compute(element, (k, v) -> v == null ? 1 : v + 1);
                if (occurrences == 2) {
                    return downstream.push(element);
                } else {
                    return true;
                }
            };
    return Gatherer.ofSequential(
            HashMap::new,
            Gatherer.Integrator.ofGreedy(integrator)
    );
}

The custom gatherer keeps track of the number of occurrences of each value, and pushes an element downstream whenever the second instance of the value is encountered. This way, each distinct element is passed downstream once it is identified as a duplicate, and no other times.

Granted, this is a fair amount of code in isolation, but it can be reused as an intermediate operation in any stream that needs to operate only on duplicates.

Javadocs

Gatherer:

An intermediate operation that transforms a stream of input elements into a stream of output elements, optionally applying a final action when the end of the upstream is reached. […]

[…]

There are many examples of gathering operations, including but not limited to: grouping elements into batches (windowing functions); de-duplicating consecutively similar elements; incremental accumulation functions (prefix scan); incremental reordering functions, etc. The class Gatherers provides implementations of common gathering operations.

API Note:

A Gatherer is specified by four functions that work together to process input elements, optionally using intermediate state, and optionally perform a final action at the end of input. They are:

creating a new, potentially mutable, state (initializer())

integrating a new input element (integrator())

combining two states into one (combiner())

performing an optional final action (finisher())

Stream.gather(Gatherer<? super T,?,R> gatherer):

Returns a stream consisting of the results of applying the given gatherer to the elements of this stream.

Gatherer.ofSequential(initializer, integrator)

Returns a new, sequential, Gatherer described by the given initializer and integrator.

Kathleenkathlene answered 22/12, 2023 at 5:59 Comment(0)

I know that the OP asked for streams, but perhaps streams is the wrong approach for the job at hand.

I love streams and use it all the time, but in this case I would suggest a simple O(n) operation

private Set<Integer> findDuplicates(List<Integer> values) {
    final List<Integer> mutableList = new ArrayList<>(values);
    final Set<Integer> removedValues = new HashSet<>();

    mutableList.removeIf(v -> {
        boolean alreadyRemoved = removedValues.contains(v);
        removedValues.add(v);
        return !alreadyRemoved;
    });
    return new HashSet<>(mutableList);
}

removeIf internally performs a loop and invokes the given Predicate parameter for each element. If returns true then the element gets removed. It returns false if the predicate is invoked again for the same element, preventing it from being deleted.
At the end, only the duplicates remain.

Unit test

@DataProvider
private Object[][] testDuplicatesProvider() {
    return new Object[][] {{
            ImmutableList.of(1, 2, 1, 3, 4, 4),
            ImmutableSet.of(1, 4)
        }, {
            ImmutableList.of(1, 2, 2, 5, 2, 6, 7, 6),
            ImmutableSet.of(2, 6)
        }, {
            ImmutableList.of(1, 2, 3),
            ImmutableSet.of()
        }, {
            ImmutableList.of(),
            ImmutableSet.of()
        }
    };
}

@Test(dataProvider = "testDuplicatesProvider")
public void testDuplicates(@Nonnull List<Integer> testData,
                           @Nonnull Set<Integer> expectedResult) {
    Assert.assertEquals(findDuplicates(testData), expectedResult);
}

Ethelda answered 11/1, 2024 at 13:29 Comment(0)

-1

Using distinct on stream would filter duplicates, you can either collect as set or List.

 numbers.stream().distinct().collect(Collectors.toSet())

Knap answered 27/10, 2022 at 7:30 Comment(1)

The question is how to find the duplicates, not remove them. – Andryc 7/7, 2023 at 1:57

-1

int arr[] = {1, 2, 3, 4, 5, 5, 6,6, 7};
Set s = new HashSet();
List collect = Arrays.stream(arr).filter(e -> !s.add(e)).boxed().collect(Collectors.toList());

System.out.println(collect);

Intransigence answered 4/4, 2024 at 14:32 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Javadocs

Recommended topics

Hot tags