How to find a duplicate element in an array of shuffled consecutive integers?
Asked Answered
I

19

75

I recently came across a question somewhere:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?

What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?

Isleana answered 9/4, 2010 at 7:35 Comment(10)
pretty sure this has been asked before, but can't find the exact qn. the total of the n integers in sequence and the repeated integer x will be x + n(n-1)/2.Cheviot
Can you please change question title to something more descriptive? Maybe "Find duplicate array element with special constraints"Lafond
another mathematical property you can use here is the factorial. (n1 * n2 * .. ) / n! gives the required number. 1000! factorial is not that big of a number to be honest - justinwhite.com/big-calc/1000.htmlAwl
Yep, 1000! is not that big of a number - only 559 digits to fill the rest of the comment box and another 2921 digits over the limit... :-)Strickman
A more hard (no single-pass in O(1) space solution) version of this question is "Algorithm to determine if array contains n…n+m?" #177618Alchemize
Slightly different question with the same answer: #35685Margarettemargarida
Again: #1090487Margarettemargarida
Duplicate: #556244Margarettemargarida
trick question - answer has been hanging around as long as the question has :-/Flavor
The problem is that after every number appears only once (we've got 1001 different integers) + 1 repeated number = 1002 integersSeibold
I
105

Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.

Eg:

Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10

Input - Expected => 2
Izmir answered 9/4, 2010 at 7:38 Comment(18)
Wouldn't that require auxiliary storage?Henrik
@Brian Rasmussen: where is the extra storage?Izmir
@leppie: To hold the calculated sum, but honestly I don't know exactly what the OP meant by extra storage. In any case, I like your answer.Henrik
@Brian, the interviewer probably meant "don't use a hash table or an array"... I'm pretty sure O(1) storage, especially a single variable, would be satisfactory.Toast
I'm guessing auxilliary storage would be storage that takes more than O(1) space.Hominy
@Michael: Thanks, that makes sense. Please disregard by question then.Henrik
the methord works perfectly fine. but the example should have been something like ( 1,3,2,4,2=>12 ) - (1+2+3+4 => 10) = 2Isleana
This solution does not scale for bigger arrays. :-)Strickman
@Franci Penov: I am not sure interview questions are supposed to scale :)Henrik
"I am not sure interview questions are supposed to scale"...but I am almost certain it would be the next question.Isleana
@Aistina & @SysAdmin: Thanks, didn't notice that ;PIzmir
@Izmir - scale in terms of array size that can be safely processed without overflow. :-)Strickman
@Franci Penov: It's an algorithm, the size of integers and arrays are unbound. Most languages could deal with big ints, the array size might be a problem, but you would have a problem before you even start ;PIzmir
sum-based algorithm requires O(log(n)) additional storage to hold the sum that is worse than O(1) for xor-based solution.Alchemize
@J.F.Sebastian why does addition need more storage than xor? In both cases you need an accumulator with O(log(n)) bits.Uplift
@Anonymous: you are right: the sum requires only twice as much bits as xor and constant factors do not matter for Big O. I don't remember what I meant: a very generous interpretation could be: one number (for the result) you have for free and using xor it is all you need but with the sum you need one more additional number (the sum is square) ~log n bits.Alchemize
@J.F.Sebastian in both cases you need an accumulator with just enough bits to contain the result. With addition you can just throw away the overflow bits (and perform subtraction with two's complement addition).Uplift
@Anonymous: it is interesting. Though "sum with specific overflow behavior" is sufficiently different from just "sum" (there is no language tag on the question: overflow behaviour is unknown -- I would assume either infinite precision integers (such as in Python) or just "twice as wide" fixed integer for the sum). Could you post it as an answer with more details how the overflow works in this case?Alchemize
S
78

Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)

Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)

printf("Answer : %d\n",
           array[0] ^
           array[1] ^
           array[2] ^
           // continue typing...
           array[999] ^
           array[1000] ^
           1 ^
           2 ^
           // continue typing...
           999^
           1000
      );

Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.

And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)

Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.

Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.

Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);

I will leave the exercise of figuring out why this works to you, with a simple hint :-):

a ^ a = 0
0 ^ a = a
Strickman answered 9/4, 2010 at 8:0 Comment(12)
A non-destructive method would be to maintain an accumulator on the side... would also make it more readable I think.Claytor
@Matthiey M. - but a non-destructive solution would require additional storage and thus violate the requirements of the problem.Strickman
And your solution requires additional storage to hold the loop counter, plus possibly some intermediate values depending on how all that gets compiled. However, chosing a destructive algorithm does prevent overflow issues without introducing a separate type to hold the sum, which gives it an actual purpose.Upstairs
@Dennis Zickefoose - I am not arguing that a non-destructive solution with an additional integer variable is not better. :-) but it does violate the problem requirement, that's why I choose the destructive algorithm. as for the loop counter - there's no way to avoid this one, and it is implicitly allowed because the problem states that the code is allowed to iterate over the array once, which is not possible without loop counter.Strickman
When you look at XOR of 1 to 1000, XOR of the whole array and compare this procedure with adding 1 to 1000 and comparing with the sum of the whole array you notice similarity - addition and XOR operation are related in binary. The only difference that XOR is more natural here because we know that the final difference can not be bigger then 1000.Afb
Not that this isn't a brilliant solution, but doesn't your first solution, the loop, access each member of the array twice?Intractable
@Robert - technically, it does. Hence the second solution. :-)Strickman
It still have additional storage in the form of temporary values, no ?Kesha
@Pavel Shved - there's no trick to XOR, it's a mathematical operation with well known properties, just as addition, multiplication and others.Strickman
@fa - I don't see any temporary values. For all I know, the compiler I've been given targets a CPU that has special instruction that does XOR on 1002 values at once. :-)Strickman
@Pavel - also, you and I look at the problem in different ways - for I am not searching for duplicate number, I am searching for a duplicate pattern in sets of flags. when you state the problem in this way, using addition now becomes the "dirty trick" :-)Strickman
@Franci: Wow, the stuff about flags makes sense. Perhaps, you might want to edit it into your answer. Then I can change the vote :-)Roasting
B
24

A non destructive version of solution by Franci Penov.

This can be done by making use of the XOR operator.

Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index:                        0, 1, 2, 3, 4

Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.

int i;
int dupe = 0;
for(i = 0; i < N; i++) {
    dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.

The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.

Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)

This makes use of an additional variable so does not meet the requirements in the question completely.

Brainstorm answered 9/4, 2010 at 9:6 Comment(3)
Frankly, I'm not getting these XOR based solutions. Basically, we're trying to match the "index" with the value of the element. In case of match, the result will be zero and for repeated value, the xor result will be non zero. For a simple array --> {1,2,2} we will xor 1(element value)^1(index)^0 (previous xor result) --> 0; 2^2^0 --> 0; 3^2^0 --> 1. Here 1 is the final result value as per XOR solutions. I don't see how this is valid answer unless I'm missing something very obvious.Admiration
@Brainstorm I think the loop should start with i initialized to 1.Adela
@Brainstorm +1 for the clear illustration and mentioning about the overflow (also for being non-destructive). The same will work, with a few changes, even if the integers have an offset, say { 1043, 1042, 1044, 1042 } by XOR-ing with { 0, 1042, 1043, 1044 }.Diaconicum
T
16

Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.

Twiggy answered 9/4, 2010 at 7:38 Comment(0)
C
9

To paraphrase Francis Penov's solution.

The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.

The solution is:

acc = 0
for i in array: acc = acc ^ i

Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.

acc = 0
for i in len(array): acc = acc ^ i ^ array[i]

Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)

But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.

It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).

Claytor answered 9/4, 2010 at 8:48 Comment(1)
I've posted Python one-liner based on your answer #2606266Alchemize
T
6

Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.

Troposphere answered 9/4, 2010 at 7:38 Comment(0)
A
5

One line solution in Python

arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2

Explanation on why it works is in @Matthieu M.'s answer.

Alchemize answered 10/4, 2010 at 5:5 Comment(1)
+1, well done: even though it's not a code golf, using python's built-in loops is faster :)Claytor
H
3

If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.

Hominy answered 9/4, 2010 at 7:39 Comment(0)
T
3

Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:

sum = 0
for each element of the array:
   sum += that element of the array
number_that_occurred_twice = sum - 500500
Toast answered 9/4, 2010 at 7:40 Comment(0)
I
1

No extra storage requirement (apart from loop variable).

int length = (sizeof array) / (sizeof array[0]);
for(int i = 1; i < length; i++) {
   array[0] += array[i];
}

printf(
    "Answer : %d\n",
    ( array[0] - (length * (length + 1)) / 2 )
);
Ibadan answered 9/4, 2010 at 7:44 Comment(5)
You are assuming the array is sorted. Bad assumption.Izmir
@leppie: how come? I didn't assume anything. And it actually uses any extra space like other answers suggest.Ibadan
Although I do have issue with the premise. It requires an extra two ints.Upstairs
@Dennis: well loop variable has to be there, and length to make it generic.Ibadan
@nvl: Even with academic exercises like this, a destructive algorithm that saves a single variable isn't particularly beneficial.Upstairs
A
1

Do arguments and callstacks count as auxiliary storage?

int sumRemaining(int* remaining, int count) {
    if (!count) {
        return 0;
    }
    return remaining[0] + sumRemaining(remaining + 1, count - 1);
}
printf("duplicate is %d", sumRemaining(array, 1001) - 500500);

Edit: tail call version

int sumRemaining(int* remaining, int count, int sumSoFar) {
    if (!count) {
        return sumSoFar;
    }
    return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);
Alsacelorraine answered 9/4, 2010 at 7:51 Comment(2)
This requires linear stack space, so that is definitely cheating.Upstairs
Throw in another argument and you can tail call optimize it.Alsacelorraine
G
1
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
Gabardine answered 21/12, 2010 at 9:38 Comment(0)
I
1
public static void main(String[] args) {
    int start = 1;
    int end = 10;
    int arr[] = {1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10};
    System.out.println(findDuplicate(arr, start, end));
}

static int findDuplicate(int arr[], int start, int end) {

    int sumAll = 0;
    for(int i = start; i <= end; i++) {
        sumAll += i;
    }
    System.out.println(sumAll);
    int sumArrElem = 0;
    for(int e : arr) {
        sumArrElem += e;
    }
    System.out.println(sumArrElem);
    return sumArrElem - sumAll;
}
Institutionalism answered 1/10, 2012 at 12:54 Comment(0)
E
1
public int duplicateNumber(int[] A) {
    int count = 0;
    for(int k = 0; k < A.Length; k++)
        count += A[k];
    return count - (A.Length * (A.Length - 1) >> 1);
}
Ephod answered 16/11, 2012 at 17:24 Comment(0)
S
0

A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.

For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:

n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
Sophiesophism answered 7/6, 2010 at 8:40 Comment(0)
K
0

Improvement of Fraci's answer based on the property of XORing consecutive values:

int result = xor_sum(N);
for (i = 0; i < N+1; i++)
{
   result = result ^ array[i];
}

Where:

// Compute (((1 xor 2) xor 3) .. xor value)
int xor_sum(int value)
{
    int modulo = x % 4;
    if (modulo == 0)
        return value;
    else if (modulo == 1)
        return 1;
    else if (modulo == 2)
        return i + 1;
    else
        return 0;
}

Or in pseudocode/math lang f(n) defined as (optimized):

if n mod 4 = 0 then X = n
if n mod 4 = 1 then X = 1
if n mod 4 = 2 then X = n+1
if n mod 4 = 3 then X = 0

And in canonical form f(n) is:

f(0) = 0
f(n) = f(n-1) xor n
Kellikellia answered 28/3, 2011 at 14:34 Comment(0)
T
0

I support the addition of all the elements and then subtracting from it the sum of all the indices but this won't work if the number of elements is very large. I.e. It will cause an integer overflow! So I have devised this algorithm which may be will reduce the chances of an integer overflow to a large extent.

   for i=0 to n-1
        begin:  
              diff = a[i]-i;
              dup = dup + diff;
        end
   // where dup is the duplicate element..

But by this method I won't be able to find out the index at which the duplicate element is present!

For that I need to traverse the array another time which is not desirable.

Tricksy answered 19/8, 2011 at 5:26 Comment(1)
Simple sums will actually work work. Integer overflow is not a problem, provided the variable tallying the sum is unsigned.Trencher
M
0

My answer to question 2:

Find the sum and product of numbers from 1 -(to) N, say SUM, PROD.

Find the sum and product of Numbers from 1 - N- x -y, (assume x, y missing), say mySum, myProd,

Thus:

SUM = mySum + x + y;
PROD = myProd* x*y;

Thus:

x*y = PROD/myProd; x+y = SUM - mySum;

We can find x,y if solve this equation.

Mongoloid answered 7/4, 2015 at 6:17 Comment(0)
B
0

In the aux version, you first set all the values to -1 and as you iterate check if you have already inserted the value to the aux array. If not (value must be -1 then), insert. If you have a duplicate, here is your solution!

In the one without aux, you retrieve an element from the list and check if the rest of the list contains that value. If it contains, here you've found it.

private static int findDuplicated(int[] array) {
    if (array == null || array.length < 2) {
        System.out.println("invalid");
        return -1;
    }
    int[] checker = new int[array.length];
    Arrays.fill(checker, -1);
    for (int i = 0; i < array.length; i++) {
        int value = array[i];
        int checked = checker[value];
        if (checked == -1) {
            checker[value] = value;
        } else {
            return value;
        }
    }
    return -1;
}

private static int findDuplicatedWithoutAux(int[] array) {
    if (array == null || array.length < 2) {
        System.out.println("invalid");
        return -1;
    }
    for (int i = 0; i < array.length; i++) {
        int value = array[i];
        for (int j = i + 1; j < array.length; j++) {
            int toCompare = array[j];
            if (value == toCompare) {
                return array[i];
            }
        }
    }
    return -1;
}
Bottle answered 1/12, 2018 at 20:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.