Find duplicate in array with a memory efficient approach
Asked Answered
A

4

23

A is an array of integers.

All the values are between 0 to A.Length-1

it means 0 <= A[i] <= A.Length-1

I am supposed to find repeating elements; and if there are several repeating elements, then choose the one that has lower index for the repeated item.

for example:

a = [3, 4, 2, 5, 2, 3]

then

result = 2

This was an interview question. I used another array to store items and check when it is repeating. Then it gave me time-out for some test cases. The interviewer advised to only loop over the array only once, and do not create any additional data structure.

Aarhus answered 29/8, 2018 at 13:2 Comment(7)
I'd suggest using a HashSet (but that is using an extra variable). When Add returns false you got your answer. Whether it is efficient enough - not sure without seeing the sample inputs and the benchmarking code.Fran
I also asked what if I create a hash table, and he declined that. I suppose it has sth to do with the value range.Aarhus
You can use the input itself as a hashset. Every time you see a value, add A.Length to the item that correspond to that index. No need for another data structure. If you find an item that is already > A.length.. you have a repetition. Track themNegligence
If their concern is only memory consumption and no duplication of the array, I think that the easiest and quickest solution is to iterate over each element in the array to see if you have seen it before. The should give you an O(n^2) algorithm which consumes the size of the array + the size of an int.Fagaceous
Have a look at this: geeksforgeeks.org/…Spectrophotometer
Try following : int[] a = { 3, 4, 2, 5, 2, 3 }; int[] b = a.Select((x, i) => new { number = x, index = i }).GroupBy(x => x.number).Select(x => x.Min(y => y.index)).ToArray();Spinoza
Are you looking for the index or the value (2 is both of them in this example).Stalinist
N
22

No need for another data structure. You can use the input itself as a hashset.

Every time you see a value, add A.Length to the item that corresponds to that index. As values might have been already incremented, you should look at the value as A[i] mod A.length.

If you find an item that is already >= A.length.. you have a repetition. (Remember that the problem states that all items are in the interval [0, A.Length-1])

Track the lowest index that has been found as repeated.

This results in O(N) complexity (single pass) and no use of an additional data structure, i.e. Size O(1)

The key concept behind this approach is that hashsets work this way. Conceptually, this is indirectly related to the pigeonhole principle. https://en.wikipedia.org/wiki/Pigeonhole_principle

Note: During the interview it would be important to ask implementation specific questions, discuss limitations, assumptions, etc.: - What is the data type of the items in the list? - if values are in the range [0..A.length-1], are all items unsigned or can I use negative numbers if I wanted? - etc.

During the interview, I would not claim this is a perfect answer, instead, I would discuss the assumptions with the interviewer and adjust accordingly. For instance, another answer suggested using negative numbers but it is possible that the data type of items is an unsigned type, etc.

The interview is supposed to trigger a technical discussion to explore both your knowledge and creativity.

Negligence answered 29/8, 2018 at 13:10 Comment(0)
M
6

Notice: Solution fails if there is element with value of zero. Olivier's solution can handle such cases.

Making element with index of A[i] negative. It only go through the loop once.

for(int i=0; i<A.Length; i++)
    {
        if (A[Math.Abs(A[i])] < 0){ return Math.Abs(A[i]);}
        A[Math.Abs(A[i])] = -A[Math.Abs(A[i])];
    }
Mesognathous answered 29/8, 2018 at 13:18 Comment(5)
yes, negative or +A.length are more or less the same ideaNegligence
yes, same idea. But I appreciate the time you spent to explain it clearly.Mesognathous
Why do you shift all the indexes by one? What if A[i] is zero? You get both an OOB exception and, unless the items are FPs, that A[x] == -A[x]. Even if you have a negative zero, -0 < 0 == false.Renunciation
That was a mistake, and thanks for pointing out. I was working on Matlab project earlier, which is one-based indexing language :D mixing up :D nice catch.Mesognathous
You still have the problem of elements with value zero. I'm not so confident it can be eliminated.Renunciation
S
3

I would like to refine @AryanFirouzian's solution and to return all duplicates by using yield return. Also, using a temp variable simplifies the code.

public static IEnumerable<int> FindDuplicates(int[] A)
{
    for (int i = 0; i < A.Length; i++) {
        int absAi = Math.Abs(A[i]);
        if (A[absAi] < 0) {
            yield return absAi;
        } else {
            A[absAi] *= -1;
        }
    }
}

However, this solution does not return the element with the lower index and if there are more than 2 identical copies then it will return the same value more than once. Another issue is that 0 cannot be made negative.

A better solution eliminates repeated results, but still returns the second index and has a problem with 0 values. It also returns the index itself to demonstrate the wrong index problem

public static IEnumerable<(int index, int value)> FindDuplicates(int[] A)
{
    for (int i = 0; i < A.Length; i++) {
        int x = A[i] % A.Length;
        if (A[x] / A.Length == 1) {
            yield return (i, x);
        }
        A[x] += A.Length;
    }
}

Tested with

var A = new int[] { 3, 4, 2, 5, 2, 3, 3 };
foreach (var item in FindDuplicates(A)) {
    Console.WriteLine($"[{item.index}] = {item.value}");
}

It returns

[4] = 2
[5] = 3

My final solution which eliminates all these problems (at least I hope so): It encodes the first index itself by adding (i + 1) * A.Length to the first occurrence of a value. (i + 1) because i can be 0. The index can then be decoded with the reverse operation (A[x] / A.Length) - 1.

Then, because we want to return a result only on the first repeating value, we set the value to a negative value to exclude it from further processing. Subsequently, the original value can be retrieved with Math.Abs(A[i]) % A.Length.

public static IEnumerable<(int index, int value)> FindDuplicates(int[] A)
{
    for (int i = 0; i < A.Length; i++) {
        int x = Math.Abs(A[i]) % A.Length;
        if (A[x] >= 0) {
            if (A[x] < A.Length) { // First occurrence.
                A[x] += (i + 1) * A.Length; // Encode the first index.
            } else { // Second occurrence.
                int firstIndex = (A[x] / A.Length) - 1; // Decode the first index.
                yield return (firstIndex, x);

                // Mark the value as handeled by making it negative;
                A[x] *= -1; // A[x] is always >= A.Length, so no zero problem.
            }
        }
    }
}

Returns the expected result

[2] = 2
[0] = 3

Our elements are ints that don't have an identity. I.e. we can return one of the duplicates at any index since two equal ints cannot be distinguished. In case the elements have an identity (they could be reference types with equal values but differing references or have additional fields not involved in equality testing), we would have to return the first occurrence with

yield return (firstIndex, Math.Abs(A[firstIndex]) % A.Length);

to satisfy all the requirements.

Stalinist answered 29/8, 2018 at 21:0 Comment(0)
B
2

For who wants an implementation of the problem, I suggest two variant (in c# as in the tags), one using the accepted answer and one using the aproach of another answer, using the opposite of the elements. However the last solution has problem with the value zero and require some trick.

First Solution

using System;
public class Program
{
    public static void Main()
    {
        int[] a = {3, 4, 0, 5, 2, 3};
        int N = 6;
        int min_index = 0; 
        bool found = false;
        int index = -1;
        int i = 0;
        while(i < N && !found)
        {

            if(a[i] >= N) 
                index = a[i] % N;
            else
                index = a[i];

            if(a[index] >= N) //its a duplicated elements 
            {
                min_index = i;
                found = true;
            }else
            {
                a[index] += N;
            }
            i++;

        }

        Console.WriteLine("Result = " + a[min_index] % N);
    }
}

Second solution

    using System;
public class Program
{
    public static void Main()
    {
        int[] a = {3, 4, 2, 5, 2, 3};
        int N = 6;
        int min_index = N-1; 
        bool found = false;
        int index = -1;
        int i = 0;
        while(i < N && !found)
        {
            if(a[i] == -N+1) //it was 0
                index = 0;
            else
                index = Math.Abs(a[i]);

            if(a[index] < 0 || a[index] == -N+1) //its a duplicated elements 
            {
                min_index = i;
                found = true;
            }else
            {
                if(a[index] > 0)
                {
                    a[index] = -a[index];
                }else
                {
                    a[index] += -N+1;
                }
            }
            i++;
        }

        if(a[min_index] == -N+1)
            a[min_index] = 0;

        Console.WriteLine("Result = " + Math.Abs(a[min_index]));
    }
}
Bannister answered 29/8, 2018 at 18:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.