Using XOR operator for finding duplicate elements in a array fails in many cases
Asked Answered
F

7

16

I came across a post How to find a duplicate element in an array of shuffled consecutive integers? but later realized that this fails for many input.

For ex:
arr[] = {601,602,603,604,605,605,606,607}

#include <stdio.h>
int main()
{
int arr[] = {2,3,4,5,5,7};
int i, dupe = 0;
for (i = 0; i < 6; i++) {
    dupe = dupe ^ a[i] ^ i;
}
printf ("%d\n", dupe);
return 0;
}

How can I modify this code so that the duplicate element can be found for all the cases ?

Firstnighter answered 26/5, 2012 at 6:57 Comment(1)
I came across at a post which says about offsetting which I am unable to understand #8018586 Can anybody suggest something..??Firstnighter
C
26

From original question:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice.

It basically says, that algorithm only works when you have consecutive integers, starting with 1, ending with some N.

If you want to modify it to more general case, you have to do following things:

Find minimum and maximum in array. Then calculate expected output (xor all integers between minimum and maximum). Then calculate xor of all elements in array. Then xor this two things and you get an output.

Castellatus answered 26/5, 2012 at 8:47 Comment(3)
Consider arr[] = {2, 3, 5, 5, 7}, then min = 2 and max = 7 Expected output = 2 ^ 3 ^ 4 ^ 5 ^ 6 ^ 7 = 1 XOR of all elements in array = 2 ^ 3 ^ 5 ^ 5 ^ 7 = 6 XOR of 1 and 6 = 1 ^ 6 = 7 !!!!! which isn't the required answer !!!!!Firstnighter
{2, 3, 5, 5, 7} is not an array of consecutive elements. Consecutive means numbers which differ by one, so if one is duplicated then good input is for expample {2, 3, 3, 4, 5, 6, 7}. In your general case, there is no simple solution. You either need randomization (hash tables) or sorting to get the answer.Castellatus
@Castellatus If we have a consecutive array, won't a simple sliding window approach will give the desired answer easily?Curative
N
36

Remember these two properties of XOR operator :

(1) If you take xor of a number with 0 ( zero ) , it would return the same number again.

Means , n ^ 0 = n

(2) If you take xor of a number with itself , it would return 0 ( zero ).

Means , n ^ n = 0

Now , Coming to the problem :

   Let    Input_arr = { 23 , 21 , 24 , 27 , 22 , 27 , 26 , 25 }    

   Output should be 27 ( because 27 is the duplicate element in the Input_arr ).

Solution :

Step 1 : Find “min” and “max” value in the given array. It will take O(n).

Step 2 : Find XOR of all integers from range “min” to “max” ( inclusive ).

Step 3 : Find XOR of all elements of the given array.

Step 4 : XOR of Step 2 and Step 3 will give the required duplicate number.

Description :

Step1 : min = 21 , max = 27

Step 2 : Step2_result = 21 ^ 22 ^ 23 ^ 24 ^ 25 ^ 26 ^ 27 = 20

Step 3 : Step3_result = 23 ^ 21 ^ 24 ^ 27 ^ 22 ^ 27 ^ 26 ^ 25 = 15

Step 4 : Final_Result = Step2_result ^ Step3_result = 20 ^ 15 = 27

But , How Final_Result calculated the duplicate number ?

Final_Result = ( 21 ^ 22 ^ 23 ^ 24 ^ 25 ^ 26 ^ 27 ) ^ ( 23 ^ 21 ^ 24 ^ 27 ^ 22 ^ 27 ^ 26 ^ 25 )

Now , Remember above two properties : n ^ n = 0 AND n ^ 0 = n

So , here ,

Final_Result = ( 21 ^ 21 ) ^ ( 22 ^ 22 ) ^ ( 23 ^ 23 ) ^ ( 24 ^ 24 ) ^ ( 25 ^ 25 ) ^ ( 26 ^ 26 ) ^ ( 27 ^ 27 ^ 27 )

             = 0 ^ 0 ^ 0 ^ 0 ^ 0 ^ 0 ^ ( 27 ^ 0 ) ( property applied )

             = 0 ^ 27 ( because we know 0 ^ 0 = 0 )

             = 27 ( Required Result )
Nonrigid answered 6/11, 2018 at 7:47 Comment(3)
if the input is [2,2,2,2,2] then this solution is giving the wrong answer.Sherris
Duplicate here means an element that occurs exactly twice, while all other elements occur exactly once. Though this will also work if you define duplicate as meaning an element that occurs an even number of times, while all other elements occur an odd number of times.Mini
This answer doesn't answer the questionDrat
C
26

From original question:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice.

It basically says, that algorithm only works when you have consecutive integers, starting with 1, ending with some N.

If you want to modify it to more general case, you have to do following things:

Find minimum and maximum in array. Then calculate expected output (xor all integers between minimum and maximum). Then calculate xor of all elements in array. Then xor this two things and you get an output.

Castellatus answered 26/5, 2012 at 8:47 Comment(3)
Consider arr[] = {2, 3, 5, 5, 7}, then min = 2 and max = 7 Expected output = 2 ^ 3 ^ 4 ^ 5 ^ 6 ^ 7 = 1 XOR of all elements in array = 2 ^ 3 ^ 5 ^ 5 ^ 7 = 6 XOR of 1 and 6 = 1 ^ 6 = 7 !!!!! which isn't the required answer !!!!!Firstnighter
{2, 3, 5, 5, 7} is not an array of consecutive elements. Consecutive means numbers which differ by one, so if one is duplicated then good input is for expample {2, 3, 3, 4, 5, 6, 7}. In your general case, there is no simple solution. You either need randomization (hash tables) or sorting to get the answer.Castellatus
@Castellatus If we have a consecutive array, won't a simple sliding window approach will give the desired answer easily?Curative
S
12

A XOR statement has the property that 'a' XOR 'a' will always be 0, that is they cancel out, thus, if you know that your list has only one duplicate and that the range is say x to y, 601 to 607 in your case, it is feasible to keep the xor of all elements from x to y in a variable, and then xor this variable with all the elements you have in your array. Since there will be only one element which will be duplicated it will not be cancelled out due to xor operation and that will be your answer.

void main()
{
    int a[8]={601,602,603,604,605,605,606,607};
    int k,i,j=601;

    for(i=602;i<=607;i++)
    {
        j=j^i;
    }

    for(k=0;k<8;k++)
    {
        j=j^a[k];
    }

    printf("%d",j);
}

This code will give the output 605, as desired!

Sortilege answered 26/5, 2012 at 13:41 Comment(6)
It works only when all the elements between x and y are present. Consider the case arr[] = {601, 602, 603, 603, 604, 606} Running it will give weird output !!!!Firstnighter
obviously, and that i made clear in my answer itself! The XOR operator has its own functionalities, you can't make it work according to your wish!Sortilege
At best, you must know the elements which are present in the array, (duplicated or not), and xor with all those elements, for example, at first, you take int i = 601^602^603^603^604^606, now xor this with the elements which the array is said to contain( you still do not know which among them is duplicated), that is, i^601^602^603^604^606. The output must be 603!Sortilege
@PaulR i was using turbo c++ when i read this question, which uses void main, instead of the standard int main(){ return 0;} Thanks anyways for awarding me the -1!Sortilege
If you fix the void main and the lack of code formatting I'll happily remove the -1Feast
@PaulR: formatted! I won't remove the void main() though, because that was the code i run on my lappy!Sortilege
G
4

Here is the code shown in the original question, which is different than your implementation. You have modified it to use a local variable instead of the last member of the array, that makes a difference:

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);
Goins answered 26/5, 2012 at 8:6 Comment(1)
It works for the case given in that question... but for the test cases like {1, 2, 10, 11, 5, 6, 8, 5} and {601,602,603,604,605,605,606,607} it gives weird results !!!!Firstnighter
P
3
//There i have created the program to find out the duplicate element in array.  Please edit if there are required some changes.  
int main()  
{  
    int arr[] = {601,602,603,604,605,605,606,607};  
    //int arr[] = {601,601,604,602,605,606,607};  
    int n= sizeof(arr)/sizeof(arr[0]);  

    for (int i = 0; i < n; i++)  
    {  
        for (int j = i+1; j < n; j++)  
        {  
             int res = arr[i] ^ arr[j];  

             if (res == 0)  
             {  
                 std::cout<< "Repeated Element in array = "<<arr[i]<<std::endl;  
             }  
        }  
    }  
    return 0;  
}  

//OR You can use HashTable and Hash Function when you enter the same
value into the hash table that time you can make count if its greater than
one value at particular index of HashTable then you can say that there are repeated value in the array.

Pears answered 3/8, 2017 at 20:49 Comment(0)
C
2

Although the answers provided here are good, yet I'd like you to refer the answer by Mohit Jain if there is an ambiguity.

The fact variable xor variable = zero can be used to locate the duplicates present in the array precisely and easily. Hope that helps!

Curtal answered 3/1, 2019 at 21:21 Comment(0)
S
0

Basically XOR works only if you have sorted array.So time complexity would be O(nlogn).

def duplicateNumber(arr):
    arr.sort()
    for i in range(1, len(arr)):
        if arr[i] ^ arr[i-1] == 0:
            return True
    return False

Better Approach

def duplicateDetect(arr):
    slow = arr[arr[0]]
    fast = arr[arr[arr[0]]]
    while slow != fast:
        slow = arr[slow]
        fast = arr[arr[fast]]
    fast = arr[0]
    while slow != fast:
        slow = arr[slow]
        fast = arr[fast]
    return slow

Time-Complexity - O(2*n) Space-Complexity - O(1)

Another Approach using HashMap (Works depend on problem statement and certain modifications needed according to problem.)

from collections import Counter
def duplicate(arr):
    c = colletions.Counter(arr)
    for key, val in c.items():
        if val == 2:
           return key
Schrock answered 19/7, 2022 at 14:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.