Algorithm to find two repeated numbers in an array, without sorting
Asked Answered
L

25

26

There is an array of size n (numbers are between 0 and n - 3) and only 2 numbers are repeated. Elements are placed randomly in the array.

E.g. in {2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, and repeated numbers are 3 and 5.

What is the best way to find the repeated numbers?

P.S. [You should not use sorting]

Laden answered 17/2, 2009 at 6:56 Comment(4)
What do you mean by "best"? Complexity? Storage?Fatherinlaw
integer or float? continuous numbers?Useful
In-place bucket sort requires no additional memory and it is O(n). See #177618Expanded
You said "find", does that mean you want their positions in the array? Or is it enough to identify the repeated values (as you have in the example)?Unpaid
G
28

There is a O(n) solution if you know what the possible domain of input is. For example if your input array contains numbers between 0 to 100, consider the following code.

bool flags[100];
for(int i = 0; i < 100; i++)
    flags[i] = false;

for(int i = 0; i < input_size; i++)
    if(flags[input_array[i]])
         return input_array[i];
    else       
        flags[input_array[i]] = true;

Of course there is the additional memory but this is the fastest.

Geomancy answered 17/2, 2009 at 7:8 Comment(7)
If the elements are integers or strings, which would be pretty common, then this approach is not going to work.Unfeigned
The question already specified the domain of the input, so this is perfectly acceptable.Enviable
A hash table can replace the array and it would work for any input then. As I mentioned, the downside is the additional memory requirement but speed wise it works.Geomancy
@sesh This is the best technique but it can be done even better - if the numbers range from 0-n then all you need are n BITS, not bytes or bools. It sounds pedantic, but often these type of questions will be phrased where n is in the billions.Hie
Andrew - you are right. Only I was too lazy to type the bit operations - its crazy typing code in a text editor :)Geomancy
It is possible (with some constraint) to write this algorithm in O(1) memory. See #556244Expanded
Doesnt this only find the first repeated number and then returns.what about the second number. I think you should not return until array is completely traversedBiagi
B
21

OK, seems I just can't give it a rest :)

Simplest solution

int A[N] = {...};

int signed_1(n) { return n%2<1 ? +n : -n;  } // 0,-1,+2,-3,+4,-5,+6,-7,...
int signed_2(n) { return n%4<2 ? +n : -n;  } // 0,+1,-2,-3,+4,+5,-6,-7,...

long S1 = 0;  // or int64, or long long, or some user-defined class
long S2 = 0;  // so that it has enough bits to contain sum without overflow

for (int i=0; i<N-2; ++i)
{
   S1 += signed_1(A[i]) - signed_1(i);
   S2 += signed_2(A[i]) - signed_2(i);
} 

for (int i=N-2; i<N; ++i)
{
   S1 += signed_1(A[i]);
   S2 += signed_2(A[i]);
} 

S1 = abs(S1);
S2 = abs(S2);

assert(S1 != S2);  // this algorithm fails in this case

p = (S1+S2)/2;
q = abs(S1-S2)/2;

One sum (S1 or S2) contains p and q with the same sign, the other sum - with opposite signs, all other members are eliminated.
S1 and S2 must have enough bits to accommodate sums, the algorithm does not stand for overflow because of abs().

if abs(S1)==abs(S2) then the algorithm fails, though this value will still be the difference between p and q (i.e. abs(p - q) == abs(S1)).

Previous solution

I doubt somebody will ever encounter such a problem in the field ;)
and I guess, I know the teacher's expectation:

Lets take array {0,1,2,...,n-2,n-1},
The given one can be produced by replacing last two elements n-2 and n-1 with unknown p and q (less order)

so, the sum of elements will be (n-1)n/2 + p + q - (n-2) - (n-1)
the sum of squares (n-1)n(2n-1)/6 + p^2 + q^2 - (n-2)^2 - (n-1)^2

Simple math remains:

  (1)  p+q = S1  
  (2)  p^2+q^2 = S2

Surely you won't solve it as math classes teach to solve square equations.

First, calculate everything modulo 2^32, that is, allow for overflow.
Then check pairs {p,q}: {0, S1}, {1, S1-1} ... against expression (2) to find candidates (there might be more than 2 due to modulo and squaring)
And finally check found candidates if they really are present in array twice.

Berar answered 17/2, 2009 at 6:56 Comment(4)
O(N) and only requires two integer variables to store the sums. Lovely.Jamison
Lovely, but maybe confusing. p+q != p^2+p^2.Subjugate
I don't think this solution works perfectly. Take a look at this array: int A[] = {2, 0, 6, 1, 1, 4, 2, 3, 5}; where n = 9. the result I get is {1, 0} -- although it did work for the array example the OP gave...Wellfavored
This solution does not work if both p and q are multiples of 4 and many other cases.Leakage
A
13

You know that your Array contains every number from 0 to n-3 and the two repeating ones (p & q). For simplicity, lets ignore the 0-case for now.

You can calculate the sum and the product over the array, resulting in:

1 + 2 + ... + n-3 + p + q = p + q + (n-3)(n-2)/2

So if you substract (n-3)(n-2)/2 from the sum of the whole array, you get

sum(Array) - (n-3)(n-2)/2 = x = p + q

Now do the same for the product:

1 * 2 * ... * n - 3 * p * q = (n - 3)! * p * q

prod(Array) / (n - 3)! = y = p * q

Your now got these terms:

x = p + q

y = p * q

=> y(p + q) = x(p * q)

If you transform this term, you should be able to calculate p and q

Allopatric answered 17/2, 2009 at 13:19 Comment(1)
From Viète's theorem it follows that p and q are roots of the equation: z2 - (p + q)*z + p*q = 0 therefore p,q = [k/2 + 1/2*(-4*m + k2)**(1/2), k/2 - 1/2*(-4*m + k**2)**(1/2)], where k=x, m=y.Expanded
U
7

Insert each element into a set/hashtable, first checking if its are already in it.

Unfeigned answered 17/2, 2009 at 7:5 Comment(5)
If the total number of distinct values are small (less than 100) it really doesn't matter if it's a set or not. Searching through a list linearly will be many times faster.Musca
@John If the number is small it doesn't matter, because both approaches are fast. For large n a hashtable or tree implementation of a set is much better. Plus it is not good to choose a list for what conceptually is a set.Hoggish
@Hoggish I would argue against that. My opinion is that linearity is always favorable over trees, hash tables and what not.Musca
Hashing as pointless, as you already know the exact domain, which is conveniently integers starting at 0.Lately
I think hashtable are better than arrays, as you need not specify the size while creating them, which is not the case with arraysTsang
B
7

Check this old but good paper on the topic:

Bilocular answered 17/2, 2009 at 7:8 Comment(1)
This solution is O(n*log(n)) in this case (due to there are only 2 duplicates in the array). Therefore it is not better than sorting in this case. The best solution should take into account that the number of duplicates is 2 and all values are in [0, n-3] range.Expanded
E
7

You might be able to take advantage of the fact that sum(array) = (n-2)*(n-3)/2 + two missing numbers.

Edit: As others have noted, combined with the sum-of-squares, you can use this, I was just a little slow in figuring it out.

Enviable answered 17/2, 2009 at 7:9 Comment(5)
The question specified that the numbers are between 0 and n-3, plus the two repeated numbers, so every number between 0 and n-3 must be in the array.Enviable
even if i know the sum, then what??Laden
The sum of the array elements includes one occurrence, so the formula would be: x = (n-2) * (n-3) / 2 - Sum of array elements.Imbibition
If you combine the sum with the product, the result will be unique: #556244Allopatric
Yup - it was sitting there in the back of my mind, but I wasn't getting that last bit of information (I was stuck trying to figure out a solution using XOR - which I still think might be potentially faster than the sum-of-squares)Enviable
E
3

Some answers to the question: Algorithm to determine if array contains n…n+m? contain as a subproblem solutions which you can adopt for your purpose.

For example, here's a relevant part from my answer:

bool has_duplicates(int* a, int m, int n)
{
  /** O(m) in time, O(1) in space (for 'typeof(m) == typeof(*a) == int')

      Whether a[] array has duplicates.

      precondition: all values are in [n, n+m) range.

      feature: It marks visited items using a sign bit.
  */
  assert((INT_MIN - (INT_MIN - 1)) == 1); // check n == INT_MIN
  for (int *p = a; p != &a[m]; ++p) {
    *p -= (n - 1); // [n, n+m) -> [1, m+1)
    assert(*p > 0);
  }

  // determine: are there duplicates
  bool has_dups = false;
  for (int i = 0; i < m; ++i) {
    const int j = abs(a[i]) - 1;
    assert(j >= 0);
    assert(j < m);
    if (a[j] > 0)
      a[j] *= -1; // mark
    else { // already seen
      has_dups = true;
      break;
    }
  }

  // restore the array
  for (int *p = a; p != &a[m]; ++p) {
    if (*p < 0) 
      *p *= -1; // unmark
    // [1, m+1) -> [n, n+m)
    *p += (n - 1);        
  }

  return has_dups; 
}

The program leaves the array unchanged (the array should be writeable but its values are restored on exit).

It works for array sizes upto INT_MAX (on 64-bit systems it is 9223372036854775807).

Expanded answered 17/2, 2009 at 13:3 Comment(0)
L
2
suppose array is

a[0], a[1], a[2] ..... a[n-1]

sumA = a[0] + a[1] +....+a[n-1]
sumASquare = a[0]*a[0] + a[1]*a[1] + a[2]*a[2] + .... + a[n]*a[n]

sumFirstN = (N*(N+1))/2 where N=n-3 so
sumFirstN = (n-3)(n-2)/2

similarly

sumFirstNSquare = N*(N+1)*(2*N+1)/6 = (n-3)(n-2)(2n-5)/6

Suppose repeated elements are = X and Y

so X + Y = sumA - sumFirstN;
X*X + Y*Y = sumASquare - sumFirstNSquare;

So on solving this quadratic we can get value of X and Y.
Time Complexity = O(n)
space complexity = O(1)
Lamdin answered 15/9, 2009 at 14:4 Comment(0)
T
2

I know the question is very old but I suddenly hit it and I think I have an interesting answer to it. We know this is a brainteaser and a trivial solution (i.e. HashMap, Sort, etc) no matter how good they are would be boring.

As the numbers are integers, they have constant bit size (i.e. 32). Let us assume we are working with 4 bit integers right now. We look for A and B which are the duplicate numbers.

We need 4 buckets, each for one bit. Each bucket contains numbers which its specific bit is 1. For example bucket 1 gets 2, 3, 4, 7, ...:

Bucket 0 : Sum ( x where: x & 2 power 0 == 0 )
...
Bucket i : Sum ( x where: x & 2 power i == 0 )

We know what would be the sum of each bucket if there was no duplicate. I consider this as prior knowledge.

Once above buckets are generated, a bunch of them would have values more than expected. By constructing the number from buckets we will have (A OR B for your information).

We can calculate (A XOR B) as follows:

A XOR B = Array[i] XOR Array[i-1] XOR ... 0, XOR n-3 XOR n-2  ... XOR 0

Now going back to buckets, we know exactly which buckets have both our numbers and which ones have only one (from the XOR bit).

For the buckets that have only one number we can extract the number num = (sum - expected sum of bucket). However, we should be good only if we can find one of the duplicate numbers so if we have at least one bit in A XOR B, we've got the answer.

But what if A XOR B is zero? Well this case is only possible if both duplicate numbers are the same number, which then our number is the answer of A OR B.

Tetracycline answered 17/1, 2012 at 4:54 Comment(0)
O
1

Sorting the array would seem to be the best solution. A simple sort would then make the search trivial and would take a whole lot less time/space.

Otherwise, if you know the domain of the numbers, create an array with that many buckets in it and increment each as you go through the array. something like this:

int count [10];

for (int i = 0; i < arraylen; i++) {
    count[array[i]]++;
}

Then just search your array for any numbers greater than 1. Those are the items with duplicates. Only requires one pass across the original array and one pass across the count array.

Oft answered 17/2, 2009 at 7:0 Comment(5)
I agree it's probably just a homework question ... however, this can be a real problem if the elements are not comparable.Unfeigned
How would you have incomparable elements? You just need to force some ordering on them. Use whatever the criteria is for sameness to derive that ordering system.Oft
given the requirement that there are only ever two repeated numbers it would be better to just return the value once a duplicate is found.Metrology
Of course you can force an ordering on anything ... but how would you do that for elements that are not already pre-defined as comparable? That should be part of the solution. Also, it can be tough in practice to have small enough ranges to make counting occurrences feasible.Unfeigned
@tj the problem specifies they are numbers and that they are in the range of 0 to n-3. That would be comparable and small enough.Oft
E
1

Here's implementation in Python of @eugensk00's answer (one of its revisions) that doesn't use modular arithmetic. It is a single-pass algorithm, O(log(n)) in space. If fixed-width (e.g. 32-bit) integers are used then it is requires only two fixed-width numbers (e.g. for 32-bit: one 64-bit number and one 128-bit number). It can handle arbitrary large integer sequences (it reads one integer at a time therefore a whole sequence doesn't require to be in memory).

def two_repeated(iterable):
    s1, s2 = 0, 0
    for i, j in enumerate(iterable):
        s1 += j - i     # number_of_digits(s1) ~ 2 * number_of_digits(i)
        s2 += j*j - i*i # number_of_digits(s2) ~ 4 * number_of_digits(i) 
    s1 += (i - 1) + i
    s2 += (i - 1)**2 + i**2

    p = (s1 - int((2*s2 - s1**2)**.5)) // 2 
    # `Decimal().sqrt()` could replace `int()**.5` for really large integers
    # or any function to compute integer square root
    return p, s1 - p

Example:

>>> two_repeated([2, 3, 6, 1, 5, 4, 0, 3, 5])
(3, 5)

A more verbose version of the above code follows with explanation:

def two_repeated_seq(arr):
    """Return the only two duplicates from `arr`.

    >>> two_repeated_seq([2, 3, 6, 1, 5, 4, 0, 3, 5])
    (3, 5)
    """
    n = len(arr)
    assert all(0 <= i < n - 2 for i in arr) # all in range [0, n-2)
    assert len(set(arr)) == (n - 2) # number of unique items

    s1 = (n-2) + (n-1)       # s1 and s2 have ~ 2*(k+1) and 4*(k+1) digits  
    s2 = (n-2)**2 + (n-1)**2 # where k is a number of digits in `max(arr)`
    for i, j in enumerate(arr):
        s1 += j - i     
        s2 += j*j - i*i

    """
    s1 = (n-2) + (n-1) + sum(arr) - sum(range(n))
       = sum(arr) - sum(range(n-2))
       = sum(range(n-2)) + p + q - sum(range(n-2))
       = p + q
    """
    assert s1 == (sum(arr) - sum(range(n-2)))

    """
    s2 = (n-2)**2 + (n-1)**2 + sum(i*i for i in arr) - sum(i*i for i in range(n))
       = sum(i*i for i in arr) - sum(i*i for i in range(n-2))
       = p*p + q*q
    """
    assert s2 == (sum(i*i for i in arr) - sum(i*i for i in range(n-2)))

    """
    s1 = p+q
    -> s1**2 = (p+q)**2
    -> s1**2 = p*p + 2*p*q + q*q
    -> s1**2 - (p*p + q*q) = 2*p*q
    s2 = p*p + q*q
    -> p*q = (s1**2 - s2)/2

    Let C = p*q = (s1**2 - s2)/2 and B = p+q = s1 then from Viete theorem follows
    that p and q are roots of x**2 - B*x + C = 0
    -> p = (B + sqrtD) / 2
    -> q = (B - sqrtD) / 2
    where sqrtD = sqrt(B**2 - 4*C)

    -> p = (s1 + sqrt(2*s2 - s1**2))/2
    """
    sqrtD = (2*s2 - s1**2)**.5
    assert int(sqrtD)**2 == (2*s2 - s1**2) # perfect square
    sqrtD = int(sqrtD)
    assert (s1 - sqrtD) % 2 == 0 # even
    p = (s1 - sqrtD) // 2
    q = s1 - p
    assert q == ((s1 + sqrtD) // 2)
    assert sqrtD == (q - p)
    return p, q

NOTE: calculating integer square root of a number (~ N**4) makes the above algorithm non-linear.

Expanded answered 18/2, 2009 at 17:59 Comment(0)
T
1

Since a range is specified, you can perform radix sort. This would sort your array in O(n). Searching for duplicates in a sorted array is then O(n)

Turtledove answered 12/5, 2011 at 22:50 Comment(0)
K
1

You can use simple nested for loop

 int[] numArray = new int[] { 1, 2, 3, 4, 5, 7, 8, 3, 7 };

        for (int i = 0; i < numArray.Length; i++)
        {
            for (int j = i + 1; j < numArray.Length; j++)
            {
                if (numArray[i] == numArray[j])
                {
                   //DO SOMETHING
                }
            }

*OR you can filter the array and use recursive function if you want to get the count of occurrences*

int[] array = { 1, 2, 3, 4, 5, 4, 4, 1, 8, 9, 23, 4, 6, 8, 9, 1,4 };
int[] myNewArray = null;
int a = 1;

 void GetDuplicates(int[] array)
    for (int i = 0; i < array.Length; i++)
            {
                for (int j = i + 1; j < array.Length; j++)
                {
                    if (array[i] == array[j])
                    {
                          a += 1;
                    }
                }
                Console.WriteLine(" {0} occurred {1} time/s", array[i], a);

                IEnumerable<int> num = from n in array where n != array[i] select n;
                 myNewArray = null;
                 a = 1;
                 myNewArray = num.ToArray() ;

                 break;

            }
             GetDuplicates(myNewArray);
Kierstenkieselguhr answered 15/9, 2011 at 9:35 Comment(0)
T
1

answer to 18.. you are taking an array of 9 and elements are starting from 0..so max ele will be 6 in your array. Take sum of elements from 0 to 6 and take sum of array elements. compute their difference (say d). This is p + q. Now take XOR of elements from 0 to 6 (say x1). Now take XOR of array elements (say x2). x2 is XOR of all elements from 0 to 6 except two repeated elements since they cancel out each other. now for i = 0 to 6, for each ele of array, say p is that ele a[i] so you can compute q by subtracting this ele from the d. do XOR of p and q and XOR them with x2 and check if x1==x2. likewise doing for all elements you will get the elements for which this condition will be true and you are done in O(n). Keep coding!

Townes answered 7/8, 2012 at 5:58 Comment(0)
L
1

check this out ... O(n) time and O(1) space complexity

 for(i=0;i< n;i++)
 xor=xor^arr[i]
 for(i=1;i<=n-3;i++)
 xor=xor^i;

So in the given example you will get the xor of 3 and 5

xor=xor & -xor  //Isolate the last digit

for(i = 0; i < n; i++)
{
if(arr[i] & xor)
  x = x ^ arr[i]; 
else
  y = y ^ arr[i]; 
}
for(i = 1; i <= n-3; i++)
{
if(i & xor)
  x = x ^ i; 
else
  y = y ^ i; 

}

x and y are your answers

Loafer answered 9/1, 2013 at 18:17 Comment(0)
M
0

Without sorting you're going to have a keep track of numbers you've already visited.

in psuedocode this would basically be (done this way so I'm not just giving you the answer):

for each number in the list
   if number not already in unique numbers list
      add it to the unique numbers list
   else
      return that number as it is a duplicate
   end if
end for each
Metrology answered 17/2, 2009 at 7:6 Comment(6)
Isn't the running time of this still O(N^2)? The length of the unique numbers list will grow to nearly the length of the array and have to be searched each number. I suppose it could be a sorted list so O(N * LogN) might be possible.Oft
If the list is replaced by a hash table or set, it's faster than O(n^2).Unfeigned
well, the best approach would be to sort the list first...but since that isn't an option we just want to find the first duplicate and return it. In the best case we will return very quickly...worse case we'll return after examining the entire list. I'm not sure how that can be avoided given no sortMetrology
If you use proper storage for "unique number list", like a tree, then this too is O(N log N). If you spend more memory and use e.g. a bool[] indicating seen status, it even becomes O(N) in time but O(MAX_NUMBER) in space.Daman
@Unfeigned Yes, if I were writing it in C# I'd definitely put it in a hash table. But since the language hasn't been specified I wrote it in psuedocode...that way if he's writing in Basic or Assembler etc he'll be able to figure something out...Metrology
i don't understand, why this was voted down. Very normal, common, practical approach. Just do not treat "unique numbers list" too literally.Hypothecate
T
0

How about this:

for (i=0; i<n-1; i++) {
  for (j=i+1; j<n; j++) {
    if (a[i] == a[j]) {
        printf("%d appears more than once\n",a[i]);
        break;
    }
  }
}

Sure it's not the fastest, but it's simple and easy to understand, and requires no additional memory. If n is a small number like 9, or 100, then it may well be the "best". (i.e. "Best" could mean different things: fastest to execute, smallest memory footprint, most maintainable, least cost to develop etc..)

Tinned answered 17/2, 2009 at 9:6 Comment(1)
This solution is O(n**2). Standard sort routine usually is O(n*log(n)) and it costs you nothing to develop and maintain.Expanded
P
0
for(i=1;i<=n;i++) {
  if(!(arr[i] ^ arr[i+1]))
        printf("Found Repeated number %5d",arr[i]);
}
Pervious answered 4/11, 2009 at 9:40 Comment(0)
S
0

I have written a small programme which finds out the number of elements not repeated, just go through this let me know your opinion, at the moment I assume even number of elements are even but can easily extended for odd numbers also.

So my idea is to first sort the numbers and then apply my algorithm.quick sort can be use to sort this elements.

Lets take an input array as below

int arr[] = {1,1,2,10,3,3,4,5,5,6,6};

the number 2,10 and 4 are not repeated ,but they are in sorted order, if not sorted use quick sort to first sort it out.

Lets apply my programme on this

using namespace std;

main()
{
    //int arr[] = {2, 9, 6, 1, 1, 4, 2, 3, 5};
    int arr[] = {1,1,2,10,3,3,4,5,5,6,6};

    int i = 0;

    vector<int> vec;

    int var = arr[0];
    for(i = 1 ; i < sizeof(arr)/sizeof(arr[0]); i += 2)
    {
            var = var ^ arr[i];

            if(var != 0 )
            {
                //put in vector
                var = arr[i-1];
                vec.push_back(var);
                i = i-1;
            }
            var = arr[i+1];
    }

    for(int i = 0 ; i < vec.size() ; i++)
        printf("value not repeated = %d\n",vec[i]);

}

This gives the output:

value not repeated= 2

value not repeated= 10

value not repeated= 4

Its simple and very straight forward, just use XOR man.

Solothurn answered 10/11, 2010 at 9:59 Comment(0)
A
0

In c:

    int arr[] = {2, 3, 6, 1, 5, 4, 0, 3, 5};

    int num = 0, i;

    for (i=0; i < 8; i++)
         num = num ^ arr[i] ^i;

Since x^x=0, the numbers that are repeated odd number of times are neutralized. Let's call the unique numbers a and b.We are left with a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask ie.choose x as a power of 2 so that x & (a^b) is nonzero.

Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that the pairs of a and b are in different buckets. So we can now apply the same method used above to each bucket independently, and discover what a and b are.

Almswoman answered 12/5, 2011 at 16:41 Comment(0)
D
0

Here is an algorithm that uses order statistics and runs in O(n).

You can solve this by repeatedly calling SELECT with the median as parameter.

You also rely on the fact that After a call to SELECT, the elements that are less than or equal to the median are moved to the left of the median.

  • Call SELECT on A with the median as the parameter.
  • If the median value is floor(n/2) then the repeated values are right to the median. So you continue with the right half of the array.
  • Else if it is not so then a repeated value is left to the median. So you continue with the left half of the array.
  • You continue this way recursively.

For example:

  • When A={2, 3, 6, 1, 5, 4, 0, 3, 5} n=9, then the median should be the value 4.
  • After the first call to SELECT
  • A={3, 2, 0, 1, <3>, 4, 5, 6, 5} The median value is smaller than 4 so we continue with the left half.
  • A={3, 2, 0, 1, 3}
  • After the second call to SELECT
  • A={1, 0, <2>, 3, 3} then the median should be 2 and it is so we continue with the right half.
  • A={3, 3}, found.

This algorithm runs in O(n+n/2+n/4+...)=O(n).

Deettadeeyn answered 25/8, 2012 at 17:19 Comment(0)
S
0

What about using the https://en.wikipedia.org/wiki/HyperLogLog?

Redis does http://redis.io/topics/data-types-intro#hyperloglogs

A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, in the case of the Redis implementation, which is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

Saretta answered 31/1, 2016 at 2:21 Comment(0)
F
0

Well using the nested for loop and assuming the question is to find the number occurred only twice in an array.

def repeated(ar,n):
    count=0
    for i in range(n):
        for j in range(i+1,n):
            if ar[i] == ar[j]:
                count+=1
        if count == 1:
            count=0
            print("repeated:",ar[i])    

arr= [2, 3, 6, 1, 5, 4, 0, 3, 5]
n = len(arr)
repeated(arr,n)
Flavoring answered 27/6, 2021 at 5:41 Comment(0)
D
-1

For each number: check if it exists in the rest of the array.

Drumstick answered 17/2, 2009 at 7:1 Comment(11)
The running time on that would be horrible though. O(N^2). Probably needs a better solution.Oft
Indeed. This is less efficient than sorting.Doodlesack
But as sorting is not allowed, it's a reasonable solution. It's certainly more practical than using an array to hold a count for all elements in the domain.Unfeigned
And 'best way' may be different than what one might think :)Teacher
This is what I get from answering a vague question, I guess :-) it's the best way if you consider the (humanly percieved, not algorithmic) complexity of the algorithm...Drumstick
@tjdonaldson: is it the 'best' way when N = 100,000,000? Doesn't sound very reasonable to me.Lauter
This solution is O(n^2). There is a solution with sorting that would be O(n log(n)). The best solution is O(n), and no, it doesn't need an array for counting.Disproportionate
It's funny that so many people consider performance the only quality metric - what about one's ability to quickly understand the algorithm and not wasting your brain cycles grokking some local optimization to a simple problem?Drumstick
The optimal solution is "grad student - write a program to find repeated numbers in this array, have it ready for tomorrow"Gynaecomastia
This would also require an extra array, or else you would check the repeating numbers several times.Trudytrue
How do we even know performance is the expected metric? This is by far the simplest code. If there are 1000 entries in the array, this is easily the best code in my book. Don't optimize until you've profiled. In short, I think this answer is the best starting point unless we know what we are optimizing for.Wideman
A
-1

Why should we try out doing maths ( specially solving quadratic equations ) these are costly op . Best way to solve this would be t construct a bitmap of size (n-3) bits , i.e, (n -3 ) +7 / 8 bytes . Better to do a calloc for this memory , so every single bit will be initialized to 0 . Then traverse the list & set the particular bit to 1 when encountered , if the bit is set to 1 already for that no then that is the repeated no . This can be extended to find out if there is any missing no in the array or not. This solution is O(n) in time complexity

Alehouse answered 14/6, 2010 at 15:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.