Java Array, Finding Duplicates
Asked Answered
P

20

71

I have an int array, and am looking for duplicates.

int[] zipcodelist = // ...

duplicates = false;
for(j = 0; j < zipcodeList.length; j++){
    for(k = 0; k < zipcodeList.length; k++){
        if (zipcodeList[k] == zipcodeList[j]){
            duplicates = true;
        }
    }
}

However, this code doesn’t work when there are no duplicates. duplicates still ends up being true. Why’s that?

Pessa answered 17/10, 2010 at 1:15 Comment(6)
What is "doesn't work" exactly? As in, what is happening vs. what do you expect to happen?Vertebral
duplicates has the wrong valuePessa
possible duplicate of Java: Detect duplicates in ArrayList? -- It's not entirely the same but... note the use of a Set/intermediate "store" vs. a nested loop. In your case, zipcodeList[k] == zipcodeList[j] for every k == j.Bernat
He's expecting duplicates to be false, but every time there are more than 0 elements in the array, the loop will set duplicates to true.Trucker
Suggest edit your question to state int[] zipcodelist =... so it's clear you're using an array of primitives.Indigestible
I think this the 2nd for loop needs an additional condition where j!=kFranciscofranciska
I
164

On the nose answer..

duplicates=false;
for (j=0;j<zipcodeList.length;j++)
  for (k=j+1;k<zipcodeList.length;k++)
    if (k!=j && zipcodeList[k] == zipcodeList[j])
      duplicates=true;

Edited to switch .equals() back to == since I read somewhere you're using int, which wasn't clear in the initial question. Also to set k=j+1, to halve execution time, but it's still O(n2).

A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(n) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}

Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];
   java.util.Arrays.fill(bitmap, false);
   for (int item : zipcodeList)
     if (!bitmap[item]) bitmap[item] = true;
     else return true;
   }
   return false;
}

Or Just to be Compact

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];  // Java guarantees init to false
   for (int item : zipcodeList)
     if (!(bitmap[item] ^= true)) return true;
   return false;
}

Does it Matter?

Well, so I ran a little benchmark, which is iffy all over the place, but here's the code:

import java.util.BitSet;

class Yuk
{
  static boolean duplicatesZero(final int[] zipcodelist)
  {
    boolean duplicates=false;
    for (int j=0;j<zipcodelist.length;j++)
      for (int k=j+1;k<zipcodelist.length;k++)
        if (k!=j && zipcodelist[k] == zipcodelist[j])
          duplicates=true;

    return duplicates;
  }


  static boolean duplicatesOne(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;
    boolean[] bitmap = new boolean[MAXZIP + 1];
    java.util.Arrays.fill(bitmap, false);
    for (int item : zipcodelist) {
      if (!(bitmap[item] ^= true))
        return true;
    }
    return false;
  }

  static boolean duplicatesTwo(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;

    BitSet b = new BitSet(MAXZIP + 1);
    b.set(0, MAXZIP, false);
    for (int item : zipcodelist) {
      if (!b.get(item)) {
        b.set(item, true);
      } else
        return true;
    }
    return false;
  }

  enum ApproachT { NSQUARED, HASHSET, BITSET};

  /**
   * @param args
   */
  public static void main(String[] args)
  {
    ApproachT approach = ApproachT.BITSET;

    final int REPS = 100;
    final int MAXZIP = 99999;

    int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };
    long[][] times = new long[sizes.length][REPS];

    boolean tossme = false;

    for (int sizei = 0; sizei < sizes.length; sizei++) {
      System.err.println("Trial for zipcodelist size= "+sizes[sizei]);
      for (int rep = 0; rep < REPS; rep++) {
        int[] zipcodelist = new int[sizes[sizei]];
        for (int i = 0; i < zipcodelist.length; i++) {
          zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));
        }
        long begin = System.currentTimeMillis();
        switch (approach) {
        case NSQUARED :
          tossme ^= (duplicatesZero(zipcodelist));
          break;
        case HASHSET :
          tossme ^= (duplicatesOne(zipcodelist));
          break;
        case BITSET :
          tossme ^= (duplicatesTwo(zipcodelist));
          break;

        }
        long end = System.currentTimeMillis();
        times[sizei][rep] = end - begin;


      }
      long avg = 0;
      for (int rep = 0; rep < REPS; rep++) {
        avg += times[sizei][rep];
      }
      System.err.println("Size=" + sizes[sizei] + ", avg time = "
            + avg / (double)REPS + "ms");
    }
  }

}

With NSQUARED:

Trial for size= 10
Size=10, avg time = 0.0ms
Trial for size= 1000
Size=1000, avg time = 0.0ms
Trial for size= 10000
Size=10000, avg time = 100.0ms
Trial for size= 100000
Size=100000, avg time = 9923.3ms

With HashSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.16ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.15ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.16ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

With BitSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.0ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.0ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.0ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

BITSET Wins!

But only by a hair... .15ms is within the error for currentTimeMillis(), and there are some gaping holes in my benchmark. Note that for any list longer than 100000, you can simply return true because there will be a duplicate. In fact, if the list is anything like random, you can return true WHP for a much shorter list. What's the moral? In the limit, the most efficient implementation is:

 return true;

And you won't be wrong very often.

Indigestible answered 17/10, 2010 at 1:22 Comment(14)
It feels... inelegant. Probably using a hash-based approach is better, especially if zipcodelist gets big. This is a O(n^2), which is a pretty good definition of "doesn't scale well." Nested deathmarches through arrays generally reeks like a long lost BigMac on a difficult-to-reach bookshelf. Better to maintain a Set<ZipCode> and test one by one if the set .contains() the next ZipCode.Indigestible
Oops, @Eric has something similar, but mine has the benefit of bailing out early when a single dupe is found. Not sure whether TreeSet wins over HashSet in this setting, but I doubt it.Indigestible
probably slightly better using BitSet: download.oracle.com/javase/1.4.2/docs/api/java/util/BitSet.htmlManiple
Yeah, perhaps, but my understanding was that BitSet was optimized for smallish sizes (i.e., roughly the word size on the machine) and may not scale to 99999 or whatever terribly well. That's just vague rumor/memory, someone should verify.Indigestible
Note the performance, even for a moderately short list (only 100k zipcodes before you hit macroscopic times -- I imagine the USPS processes that many before their crumpets get cold in the AM.)Indigestible
Another alternative: if you're tight on space for whatever reason, you can sort your original list in place, then search for duplicate neighbors, for O(nlogn) time and O(1) space.Cinereous
I think for the "A faster (in the limit) way" part and "Bow to HuyLe" part, the time complexity is the same, i.e O(n). Please look this ..>>#11166747Scree
@Cartercarteret It is correct. If there are no dupes, each item flips an array element to true once, finally false is returned. If there are, at the first dupe, bitmap[item] was set to true by the first occurrence of item, the second flips it to false, the if (!(bitmap[item] ^= true)) evaluates to true, immediately, true is returned.Exploratory
@DanielFischer: My bad. I didn't read the whole code before commenting. I was thinking that this is the solution to "count the number of duplicates", or "listing elements that are not duplicated". For the problem of "detecting duplicate", the "Or just to be compact" solution is correct.Cartercarteret
HashSet.add returns a boolean so it’s not necessary to call contains before it. Halving the number of hash operations can make a big difference.Eureka
It is unnecessary to check k!=j given that the starting value of k is j+1, the values will never equate.Outer
I have some doubt on the first approach. Fixed j, k=j+1 goes on the subarray [j+1, N-1] given N elements. So, j will be never equals to k as already noticed.Goddard
Huyle answers is doing one hidden loop for nothing with java.util.Arrays.fill(bitmap, false); since every cells are already to false after the initialisation.Peachy
why do you check for k!=j in the "on the nose answer"? I cant see it as being necessary.Galop
M
16

Let's see how your algorithm works:

an array of unique values:

[1, 2, 3]

check 1 == 1. yes, there is duplicate, assigning duplicate to true.
check 1 == 2. no, doing nothing.
check 1 == 3. no, doing nothing.
check 2 == 1. no, doing nothing.
check 2 == 2. yes, there is duplicate, assigning duplicate to true.
check 2 == 3. no, doing nothing.
check 3 == 1. no, doing nothing.
check 3 == 2. no, doing nothing.
check 3 == 3. yes, there is duplicate, assigning duplicate to true.

a better algorithm:

for (j=0;j<zipcodeList.length;j++) {
    for (k=j+1;k<zipcodeList.length;k++) {
        if (zipcodeList[k]==zipcodeList[j]){ // or use .equals()
            return true;
        }
    }
}
return false;
Maniple answered 17/10, 2010 at 1:20 Comment(1)
This answer is especially helpful because this is the most easily portable to other languages (Giving it to you rather than andersoj because you beat them by 2 min.)Meri
C
15

You can use bitmap for better performance with large array.

    java.util.Arrays.fill(bitmap, false);

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else break;

UPDATE: This is a very negligent answer of mine back in the day, keeping it here just for reference. You should refer to andersoj's excellent answer.

Cobwebby answered 17/10, 2010 at 1:52 Comment(2)
+1 given the size of the space (assuming 5-digit zipcodes), great!Indigestible
In what way was this negligent?Meri
K
5

To check for duplicates you need to compare distinct pairs.

Koger answered 17/10, 2010 at 1:19 Comment(1)
I guess this should have been a comment.. but it's 6 years old so w/eOpheliaophelie
I
2

Cause you are comparing the first element of the array against itself so It finds that there are duplicates even where there aren't.

Ing answered 17/10, 2010 at 1:20 Comment(0)
K
2

Initialize k = j+1. You won't compare elements to themselves and you'll also not duplicate comparisons. For example, j = 0, k = 1 and k = 0, j = 1 compare the same set of elements. This would remove the k = 0, j = 1 comparison.

Kliment answered 17/10, 2010 at 1:25 Comment(0)
N
2

You can also work with Set, which doesn't allow duplicates in Java..

    for (String name : names)
    {         
      if (set.add(name) == false) 
         { // your duplicate element }
    }

using add() method and check return value. If add() returns false it means that element is not allowed in the Set and that is your duplicate.

Nehemiah answered 31/8, 2016 at 13:48 Comment(2)
Please use !set.add(name) instead of set.add(name) == false.Unfrequented
Just different ways of representing the idea. I used "false" just to show @moby the idea of returning a false, which is not allowing the duplicateNehemiah
E
1

Don't use == use .equals.

try this instead (IIRC, ZipCode needs to implement Comparable for this to work.

boolean unique;
Set<ZipCode> s = new TreeSet<ZipCode>();
for( ZipCode zc : zipcodelist )
    unique||=s.add(zc);
duplicates = !unique;
Eric answered 17/10, 2010 at 1:21 Comment(0)
C
1
public static ArrayList<Integer> duplicate(final int[] zipcodelist) {

    HashSet<Integer> hs = new HashSet<>();
    ArrayList<Integer> al = new ArrayList<>();
    for(int element: zipcodelist) {
        if(hs.add(element)==false) {
            al.add(element);
        }   
    }
    return al;
}
Cappella answered 12/7, 2018 at 5:12 Comment(1)
Hey Ferdinand and welcome to Stackoverflow! Preferably you would add a little but of explanation to your code to show how it solves the question asked :)Repression
V
0

How about using this method?

HashSet<Integer> zipcodeSet = new HashSet<Integer>(Arrays.asList(zipcodeList));
duplicates = zipcodeSet.size()!=zipcodeList.length;
Valer answered 13/4, 2016 at 5:23 Comment(0)
S
0

@andersoj gave a great answer, but I also want add new simple way

    private boolean checkDuplicateBySet(Integer[] zipcodeList) {
        Set<Integer> zipcodeSet = new HashSet(Arrays.asList(zipcodeList));
        if (zipcodeSet.size() == zipcodeList.length) {
            return true;
        }
        return false;
    }

In case zipcodeList is int[], you need convert int[] to Integer[] first(It not auto-boxing), code here

Complete code will be:

    private boolean checkDuplicateBySet2(int[] zipcodeList) {
        Integer[] zipcodeIntegerArray = new Integer[zipcodeList.length];
        for (int i = 0; i < zipcodeList.length; i++) {
            zipcodeIntegerArray[i] = Integer.valueOf(zipcodeList[i]);
        }

        Set<Integer> zipcodeSet = new HashSet(Arrays.asList(zipcodeIntegerArray));
        if (zipcodeSet.size() == zipcodeList.length) {
            return true;
        }
        return false;
    }

Hope this helps!

Salmonoid answered 22/2, 2018 at 3:30 Comment(0)
J
0

Print all the duplicate elements. Output -1 when no repeating elements are found.

import java.util.*;

public class PrintDuplicate {

    public static void main(String args[]){
        HashMap<Integer,Integer> h = new HashMap<Integer,Integer>();


        Scanner s=new Scanner(System.in);
        int ii=s.nextInt();
        int k=s.nextInt();
        int[] arr=new  int[k];
        int[] arr1=new  int[k];
        int l=0;
        for(int i=0; i<arr.length; i++)
            arr[i]=s.nextInt();
        for(int i=0; i<arr.length; i++){
            if(h.containsKey(arr[i])){
                h.put(arr[i], h.get(arr[i]) + 1);
                arr1[l++]=arr[i];
            } else {
                h.put(arr[i], 1);
            }
        }
        if(l>0)
        { 
            for(int i=0;i<l;i++)
                System.out.println(arr1[i]);
        }
        else
            System.out.println(-1);
    }
}
Jellaba answered 6/4, 2018 at 4:39 Comment(0)
C
0
import java.util.Scanner;

public class Duplicates {
    public static void main(String[] args) {
        Scanner console = new Scanner(System.in);
        int number = console.nextInt();
        String numb = "" + number;
        int leng = numb.length()-1;

        if (numb.charAt(0) != numb.charAt(1)) {
            System.out.print(numb.substring(0,1));
        }

        for (int i = 0; i < leng; i++){

          if (numb.charAt(i)==numb.charAt(i+1)){ 
             System.out.print(numb.substring(i,i+1));
          }
          else {
              System.out.print(numb.substring(i+1,i+2));
          }
       }
   }
}
Commercialize answered 15/4, 2018 at 15:0 Comment(0)
O
0

This program will print all duplicates value from array.

public static void main(String[] args) {

    int[] array = new int[] { -1, 3, 4, 4,4,3, 9,-1, 5,5,5, 5 };
    
    Arrays.sort(array);

 boolean isMatched = false;
 int lstMatch =-1;
      for(int i = 0; i < array.length; i++) {  
          try {
                if(array[i] == array[i+1]) { 
                    isMatched = true;
                    lstMatch = array[i+1]; 
                }
                else if(isMatched) {
                    System.out.println(lstMatch);
                    isMatched = false;
                    lstMatch = -1;
                }
          }catch(Exception ex) {
              //TODO NA
          }

      }
      if(isMatched) {
          System.out.println(lstMatch);
      }

}

}

Ogdan answered 18/8, 2020 at 11:11 Comment(0)
A
0
  public static void findDuplicates(List<Integer> list){
    if(list!=null && !list.isEmpty()){
      Set<Integer> uniques = new HashSet<>();
      Set<Integer> duplicates = new HashSet<>();

      for(int i=0;i<list.size();i++){
        if(!uniques.add(list.get(i))){
          duplicates.add(list.get(i));
        }
      }
      System.out.println("Uniques: "+uniques);
      System.out.println("Duplicates: "+duplicates);
    }else{
      System.out.println("LIST IS INVALID");
    }
  }
Alodie answered 24/4, 2021 at 12:5 Comment(2)
You realize that the question is 11 years old and it's unlikely anyone is watching.Lashawn
@Lashawn I think I'm a little late :DAlodie
C
0
public static <T> Set<T> getDuplicates(Collection<T> array) {
    Set<T> duplicateItem = new HashSet<>();
    Iterator<T> it = array.iterator();
    while(it.hasNext()) {
        T object = it.next();
        if (Collections.frequency(array, object) > 1) {
            duplicateItem.add(object);
            it.remove();
        }
    }
    return duplicateItem;
} 

// Otherway if you dont need to sort your collection, you can use this way.

public static <T> List<T> getDuplicates(Collection<T> array) {
    List<T> duplicateItem = new ArrayList<>();
    Iterator<T> it = array.iterator();
    while(it.hasNext()) {
        T object = it.next();
        if (Collections.frequency(array, object) > 1) {
            if (Collections.frequency(duplicateItem, object) < 1) {
                duplicateItem.add(object);
            }
            it.remove();
        }
    }
    return duplicateItem;
}
Calc answered 30/12, 2021 at 10:36 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Gillette
M
0
import java.util.*;
class Example{
    public static boolean duplicate(int[] a) {
        for (int i=0 ; i<a.length-1 ; i++){
            for (int j = i+1; j<a.length ; j++){
                if(a[i]==a[j]){
                    return true;
                }
            }
        }
        return false;
    }
    public static void main(String []args) {
        int[] ar={34,65,87,19,94,72,83,47,89};      
        System.out.println(Arrays.toString(ar));//[34,65,87,19,94,72,83,47,89]
        System.out.println("ar is a duplicate array : " + duplicate(ar)); //false
        
        int[] br={34,65,87,19,94,72,83,94,89};      
        System.out.println(Arrays.toString(br));//[34,65,87,19,94,72,83,94,89]
        System.out.println("br is a duplicate array : "+  duplicate(br)); //true
    }
}
Mantelpiece answered 12/10, 2022 at 3:44 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Gillette
E
0
public class Rough_work {

    public static void main(String[] args){
        int arr[] = new int[]{1,2,1,3,2,3,1,4,5,4};

        int dup = 0;
         for(int i = 0; i<arr.length;i++){
             for(int j =i+1; j<arr.length;j++){
                 if(arr[i] == arr[j]){
                     System.out.print(arr[i]);
                     break;
                 }
             }
         }

    }
}
Eckenrode answered 10/9, 2023 at 13:7 Comment(0)
D
0
    class Solution {
public int findDuplicate(int[] nums) {
    int pre =nums[0], i=0,p=0;
    while(i<nums.length){
        if(nums[pre]!=-1){
            p = nums[pre];
            nums[pre] =-1;  
            pre =p;
        }
        else {
            break;
        }
        i++;
    } 
    return pre;
}

public static void main(String args[]){
  int [] arr = {1,3,4,2,2};
  System.out.println(findDuplicate(arr));
}
}
Dryer answered 19/4 at 13:5 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Gillette
F
0

Example:

int[] array = {10, 2, 2, 3, 19, 5, 6, 7, 16, 17, 18, 19};
Set<Integer> duplicates = new HashSet<>();

for (int i = 0; i < array.length - 1; i++) {
    for (int j = i + 1; j < array.length; j++) {
       if (array[i] == array[j]) {
           duplicates.add(array[i]);
       }
    }
}
        
System.out.println(duplicates);

Output: [2, 19]

Fillip answered 17/5 at 13:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.